Jira server process is unexpectedly terminated in Linux (Out Of Memory Killer)

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

プラットフォームについて: Server および Data Center のみ。この記事は、Server および Data Center プラットフォームのアトラシアン製品にのみ適用されます。

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Fisheye および Crucible は除く

要約

The JIRA process is being terminated unexpectedly in a Linux environment due to the Out Of Memory Killer(OOM) and there is a lack of a clean shutdown in Jira's logs. 

For example, looking at the atlassian-jira.logs, we can see that there is no clean shutdown process:

2021-06-22 13:53:14,025+0000 plugin-transaction-0 INFO [c.a.jira.plugin.PluginTransactionListener] [plugin-transaction] numberStartEvents:1011, numberEndEvents:1011, numberSendEvents:545, numberEventsInTransactions:18710, numberOfPluginEnableEvents:313
#### No shutdown process was started as noted by the lack of localhost-startStop-2 logging. 

2021-06-22 14:08:38,286+0000 localhost-startStop-1 INFO [c.a.jira.startup.JiraHomeStartupCheck] The jira.home directory '/var/atlassian/application-data/jira' is validated and locked for exclusive use by this instance.
2021-06-22 14:08:38,337+0000 JIRA-Bootstrap INFO [c.a.jira.startup.JiraStartupLogger] 
 
 ****************
 JIRA starting...
 ****************
 
2021-06-22 14:08:38,521+0000 JIRA-Bootstrap INFO [c.a.jira.startup.JiraStartupLogger]

You can search for the OOM killer process with the following command to confirm this is happening. 

# dmesg -T | egrep -i -B 1 'killed process'
Example output:[Tue Jun 22 13:42:49 2021] Out of memory: Kill process 90619 (java) score 440 or sacrifice child
[Tue Jun 22 13:42:49 2021] Killed process 95510 (java), UID 752, total-vm:12301500kB, anon-rss:1873032kB, file-rss:0kB, shmem-rss:0kB

The following appears in the /var/log/messages, /var/log/syslog, or the systemd kernel journal:

Aug 12 19:12:19 ussclpdapjra002 kernel: java invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Aug 12 19:12:19 ussclpdapjra002 kernel:
Aug 12 19:12:19 ussclpdapjra002 kernel: Call Trace:
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff800c82e8>] out_of_memory+0x8e/0x2f3
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff8000f506>] __alloc_pages+0x27f/0x308
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff80017949>] cache_grow+0x133/0x3c1
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff8005c6f9>] cache_alloc_refill+0x136/0x186
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff800dc9e3>] kmem_cache_zalloc+0x6f/0x94
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff800bf56f>] taskstats_exit_alloc+0x32/0x89
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff80015693>] do_exit+0x186/0x911
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff800496a1>] cpuset_exit+0x0/0x88
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff8002b29e>] get_signal_to_deliver+0x465/0x494
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff8005b295>] do_notify_resume+0x9c/0x7af
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff8008e16d>] default_wake_function+0x0/0xe
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff800a52a6>] sys_futex+0x10b/0x12b
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff8005e19f>] sysret_signal+0x1c/0x27
Aug 12 19:12:19 ussclpdapjra002 kernel:  [<ffffffff8005e427>] ptregscall_common+0x67/0xac

Additionally the /var/log/messages, /var/log/syslog, or the systemd kernel journal may include the following log

Aug 12 19:11:52 ussclpdapjra002 kernel: INFO: task java:5491 blocked for more than 120 seconds.
Aug 12 19:11:52 ussclpdapjra002 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 12 19:11:52 ussclpdapjra002 kernel: java          D 0000000000000014     0  5491      1          5492  5490 (NOTLB)
Aug 12 19:11:52 ussclpdapjra002 kernel:  ffff810722859e18 0000000000000082 0000000000000000 0000000000000001
Aug 12 19:11:52 ussclpdapjra002 kernel:  ffff810722859e88 000000000000000a ffff81083673a100 ffff8107024d7080
Aug 12 19:11:52 ussclpdapjra002 kernel:  0000d1276b8c3dc6 0000000000003296 ffff81083673a2e8 0000000400000000
Aug 12 19:11:52 ussclpdapjra002 kernel: Call Trace:
Aug 12 19:11:52 ussclpdapjra002 kernel:  [<ffffffff80016dd4>] generic_file_aio_read+0x34/0x39
Aug 12 19:11:52 ussclpdapjra002 kernel:  [<ffffffff800656ac>] __down_read+0x7a/0x92
Aug 12 19:11:52 ussclpdapjra002 kernel:  [<ffffffff80067ad0>] do_page_fault+0x446/0x874
Aug 12 19:11:52 ussclpdapjra002 kernel:  [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
Aug 12 19:11:52 ussclpdapjra002 kernel:  [<ffffffff8000c62d>] _atomic_dec_and_lock+0x39/0x57
Aug 12 19:12:08 ussclpdapjra002 kernel:  [<ffffffff8000d3fa>] dput+0x3d/0x114
Aug 12 19:12:10 ussclpdapjra002 kernel:  [<ffffffff8005ede9>] error_exit+0x0/0x84
Aug 12 19:12:11 ussclpdapjra002 kernel:


環境

Any version of Jira Software Data Center or Server.

Any version of Jira Service Management Data Center or Server.


原因

When the system runs out of memory Linux kernel will automatically start killing processes that consume the largest amount of memory and in this case, JIRA JVM was consuming the highest and it was killed.

This error is usually due to one or more of the below-listed issues.

  1. The memory configured to be used by JIRA's JVM as configured with the -XMX parameter is not available in the machine.
  2. There is not enough physical memory allocated to the Jira node to run Jira and other processes. 
  3. JIRA JVM is configured higher value of -XMX which is not required for the size of the instance.
  4. One or more processes other than JIRA are consuming an unexpectedly high amount of memory.


ソリューション

このページの内容は、Jira アプリケーションでサポートされていないプラットフォームに関連しています。したがって、アトラシアンは、そのためのサポートの提供を保証できません 。この資料は情報提供のみを目的としているため、お客様自身の責任でご使用ください。

This error requires careful analysis to look at the memory usage patterns and decide how much memory a JIRA instance needs and adjust the Server capabilities. The below documents will help in making the right decision about the requirements.

In some cases, the OOM Killer could be getting triggered as a consequence of processes other than JIRA consuming a lot of memory. It's possible to get more details about all the processes running at the time it killed JIRA by analyzing the dmesg. In these logs, locate the "Out of memory: Kill process " message. Just above that message, the kernel dumps the stats of the processes that were running. For example:


[...]
[XXXXX] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[XXXXX] [  480]     0   480    13863      113      26        0         -1000 auditd
[XXXXX] [12345]   123 12345  4704977  3306330    6732        0             0 java
[XXXXX] [11939]     0 11939    46699      328      48        0             0 crond
[XXXXX] [11942]     0 11942    28282       45      12        0             0 sh
[XXXXX] [16789]   456 16789  1695936    38643     165        0             0 java
[...]
[XXXXX] Out of memory: Kill process 12345 (java) score 869 or sacrifice child
[XXXXX] Killed process 12345 (java) total-vm:18819908kB, anon-rss:13225320kB, file-rss:0kB, shmem-rss:0kB
[...]

In this example, the JIRA PID was 12345 and it was killed. We can see in the summary (Killed process line) that JIRA was using ~13 GiB of memory (see anon-rss - the total-vm value can be disregarded). However, in the table there is also another process with PID 16789 that is reserving ~6.4 GiB of memory (note that table memory values are in 4 KiB pages, so you must multiply the rss value by 4 to determine actual RAM usage in KiB). You can then investigate more about this other process and see what it does by running the following command:

$ ps -f <pid>

It is possible that this process is leaking memory or perhaps just should not be running on the same system as JIRA.

For Kubernetes

The Exit code 137 is important because it means that the system terminated the container as it tried to use more memory than its limit

Run this command to find exit code 137 :

kubectl get pod <pod-name> -o yaml

We should see exit code 137 in the output for example :

    state:
      terminated:
        containerID: docker://054cd898b7fff24f75f467895d4b0680c83fc54f49679faeaae975a579af87b8
        exitCode: 137

(info) External resource : https://sysdig.com/blog/troubleshoot-kubernetes-oom/




最終更新日 2023 年 5 月 8 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.