Jira applications stall due to StackOverflowError exception

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

問題

Jira applications can stall when a  java.lang.StackOverflowError exception occurs due to corruption of internal memory structures. 

次のような症状が含まれます。

atlassian-jira.log に次のメッセージが表示される。

2017-11-28 08:21:19,815 http-nio-8080-exec-8 ERROR user 501x750485x1 13esc 10.0.16.1 /secure/AssignIssue.jspa [c.o.scriptrunner.runner.AbstractScriptListener] Script function failed on event: com.atlassian.jira.event.issue.IssueEvent, file: com.org.EpicLinkListener
java.lang.StackOverflowError

Notice that the stack trace of the exception may vary depending on the underlying operation being performed in JIRA

診断

環境

  • The problem can be triggered by any bug in the code. The chance of hitting a  StackOverflowError  becomes higher if you have any 3rd party code or write your own code (in case of Groovy from ScriptRunner)

Diagnostic Steps

  • If you check the logs, you can see a large number of StackOverflowErrors

    grep -ic 'StackOverflowError' atlassian-jira.log* 
          63

原因

The JVM running Jira applications has hit an StackOverflowError triggered by the code.  StackOverflowError  is an asynchronous exception that can be thrown by the Java Virtual Machine whenever the computation in a thread requires a larger stack than is permitted. The Java Language Specification permits a StackOverflowError to be thrown synchronously by method invocation. This mechanism is a clean way to report that a stack overflow has occurred while preserving the JVM's integrity, but it doesn't provide a safe way for the application to recover from this situation. A stack overflow could occur in the middle of a sequence of modifications which, if not complete, could leave a data structure in an inconsistent state.


Quote from http://openjdk.java.net/jeps/270

For instance, when a  StackOverflowError  is thrown in a critical section of the  java.util.concurrent.locks.ReentrantLock  class, the lock status can be left in an inconsistent state, leading to potential deadlocks. The  ReentrantLock  class uses an instance of  AbstractSynchronizerQueue  to implement its critical section. 

After an StackOverflowError error, the Jira application will likely be in an unstable state and hence it is essential to restart your Jira applications immediately.

Work Around

  1. Restart Jira
  2. Upgrade to Oracle JVM version 11
    1. There are some JVM-level fixes that should allow Jira to handle these exceptions more gracefully: https://blogs.oracle.com/poonam/stackoverflowerror-and-threads-waiting-for-reentrantreadwritelock  

ソリューション

  • Fix the problem in the code which has caused StackOverflowError. The add-on vendor may need to be contacted regarding this
  • Sometimes, It may also be possible to identify the offending component by thoroughly examining the actual stack trace (the application vendor can help with that). While third party applications are not supported by Atlassian, as a best effort here's an example of a case where it was identified that the issue was caused by a 3rd party JQL function PreviousSprint:
    1. The stack trace referenced a class "PreviousSprint":

    at com.onresolve.jira.groovy.jql.plugins.PreviousSprint.getRelevantSprint(PreviousSprint.groovy:27)
    at com.onresolve.jira.groovy.jql.plugins.PreviousSprint$getRelevantSprint.callCurrent(Unknown Source)

    2. It was then identified then that the JQL function "previousSprint" was used on one of the newly created dashboards JQL filters;
    3. The JQL function previousSprint was then disabled, and once the board reloaded, the issue was resolved

That might happen only when that thread encountered a StackOverflowError before it could get to execute unlock() on the ReadLock in the finally block.


Last modified on Mar 21, 2024

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.