Jira becomes unresponsive occasionally and logs several Full GC events

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

プラットフォームについて: Server および Data Center のみ。この記事は、Server および Data Center プラットフォームのアトラシアン製品にのみ適用されます。

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Fisheye および Crucible は除く

    

要約

Occasionally — or intermittently — Jira freezes and becomes unresponsive. Screens don't load and user requests aren't completed.

If GC logs are enabled, many Full GC events will be logged consecutively on the atlassian-jira-gc files.


環境

Jira 7 and 8.

Workflow conditions, validations or post-functions that make use of Groovy scripts (from Script RunnerJMWEInsight, etc).


診断

1) GC logs

The first part is several consecutive Full GC events, suggesting the JVM is trying to free memory but it can't because there's some Thread actively loading (a huge amount of) objects into memory:

$ cat <Jira-INSTALL-DIR>/logs/atlassian-jira-gc-* | grep -i "full gc" | sort | tail -10
2020-07-13T11:04:37.381-0300: 59396.722: [Full GC (Allocation Failure)  31G->29G(32G), 43.6321319 secs]
2020-07-13T11:05:32.674-0300: 59452.014: [Full GC (Allocation Failure)  31G->29G(32G), 40.9281876 secs]
2020-07-13T11:06:25.137-0300: 59504.478: [Full GC (Allocation Failure)  31G->29G(32G), 40.3168184 secs]
2020-07-13T11:07:27.454-0300: 59566.795: [Full GC (Allocation Failure)  31G->29G(32G), 40.4040414 secs]
2020-07-13T11:08:19.967-0300: 59619.308: [Full GC (Allocation Failure)  31G->29G(32G), 41.8864183 secs]
2020-07-13T11:09:14.481-0300: 59673.822: [Full GC (Allocation Failure)  31G->29G(32G), 43.1418877 secs]
2020-07-13T11:10:00.251-0300: 59719.592: [Full GC (Allocation Failure)  31G->29G(32G), 40.7752660 secs]
2020-07-13T11:10:52.692-0300: 59772.032: [Full GC (Allocation Failure)  31G->29G(32G), 40.0125870 secs]
2020-07-13T11:11:44.536-0300: 59823.877: [Full GC (Allocation Failure)  31G->29G(32G), 40.7520695 secs]

In the example above, Full GC is running every minute and taking ~40 secs each time. We'd expect the JVM to do nothing but collect garbage and seldom execute any other Thread.

2) Heap dump

The second part is achieved by generating a Heap dump and analyzing it. We should see a single (or few) HTTP(S) Thread(s) consuming most of the Heap. The Dominator Tree will show a java.util.ArrayList with thousands of com.atlassian.jira.issue.DocumentIssueImpl objects, each consuming few KBs but together adding up to many GBs:

The screenshots above are from Eclipse Memory Analyzer (aka Eclipse MAT).


原因

Loading hundreds of thousands of issues into memory can cause the JVM to run out of memory and trigger Full GC cycles to try to cope with it. When running a Full GC, the JVM freezes every other Thread (including users requests) to try to free up old/unused memory.

Such scenario is easily accomplished if there's a Groovy custom script — usually on a workflow — that loads the results of a JQL:

import com.atlassian.jira.component.ComponentAccessor;
import com.atlassian.jira.issue.search.SearchProvider;
import com.atlassian.jira.jql.parser.JqlQueryParser;
import com.atlassian.jira.web.bean.PagerFilter;

def jqlQueryParser = ComponentAccessor.getComponent(JqlQueryParser);
def searchProvider = ComponentAccessor.getComponent(SearchProvider);
def issueManager = ComponentAccessor.getIssueManager();
def currentUser = ComponentAccessor.getJiraAuthenticationContext().getUser();

def jqlQuery = jqlQueryParser.parseQuery("QUERY THAT FETCHES (OR MAY FETCH) TOO MANY ISSUES");
def queryResults = searchProvider.search(jqlQuery, currentUser, PagerFilter.getUnlimitedFilter());

...

The search method of Jira's SearchProvider returns a SearchResult objects containing a representation of every issue the JQL would return. Each Issue object allocates only a handful of KBs, yet if the JQL returns hundreds of thousands of issues (this threshold dependents on how much Heap the JVM has), this will certainly lead to a JVM memory shortage. This is specially tricky if the JQL is dynamically built on user input or issue field values.

This explains the occasional/intermittent manifestation of the problem: it only occurs when the Groovy script is executed with the specific parameters that leads the JQL fetching too many issues.


ソリューション

We can identify the workflow that's likely making use of this memory abusing script, we can browse the Dominator Tree on the heap dump looking for the com.atlassian.jira.web.filters.accesslog.AccessLogFilter. It should reveal us the URL accessed by the user Thread that consumed most Heap:

In the example above, it was an issue creation through a Jira Service Management Portal.

The Jira Admin should trace back which Workflow this issue type creation's mapped to and review the transition's conditions, validations and post-functions for something that could be loading so many issues.

If each issue's content is relevant

If each issue's content is relevant to the script, the JQL would have to be adjusted to bring less issues into the memory — either filtering on creation or update dates or something.

If the JQL is dynamically built by user input, the workflow or script itself should validate if the inputs are adequate (eg. not empty) before executing the JQL search.

If only the result count is relevant

If only the total number of issues retrieved by the JQL is relevant, then it's best to make use of the searchCount(...) method instead of the search(...). The searchCount returns a long type variable instead of a list of Issue objects and doesn't load issue documents into memory.



最終更新日 2020 年 11 月 23 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.