Troubleshooting WebHooks in Jira Data Center

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

プラットフォームについて: Server および Data Center のみ。この記事は、Server および Data Center プラットフォームのアトラシアン製品にのみ適用されます。

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Fisheye および Crucible は除く

要約

This article offers techniques you may use to troubleshoot WebHooks in Jira that you suspect are either not working or impacting general performance of user operations.

Please refer to Best practices on working with WebHooks in Jira for advices on how to make best usage of them.

The concept of WebHooks can also be implemented through Automation for Jira's Web Requests actions — please refer to Automate your project for more on this.


Quick overview on WebHooks mechanics in Jira

WebHooks in Jira are triggered by specific events but have global context — even if you specify a project or other conditions in the Filter/JQL.

Jira knows which WebHooks are listening to each event before hand, but doesn't know which will match the Issue being worked on. What it does is execute the JQL Filter of all WebHooks, one by one, and those that match the Issue being worked on are published to reach to the URL configured in them.

This having too many WebHooks or WebHooks with slow or heavy Filters may increase the response times of many operations in Jira.

Asynchronous WebHooks warning

Please note these versions of Jira were shipped with asynchronous WebHooks enabled by default!

8.13.21
8.13.22
8.13.23
8.13.24

8.13.25 and all others have asynchronous mode disabled by default again (as all versions from 8.14 to 8.22 and 9.x). If you upgrade from any of the affected versions above to 8.13.25+ or any other you don't have to do anything — upon each node restart WebHooks will behave synchronously again by default.


環境

Any version of Jira Core 7, 8 and 9.

Special attention to versions 8.13.21 to 8.13.24 (inclusively), as they have asynchronous WebHooks enabled by default.


ソリューション

1. WebHooks not working

If you suspect the WebHooks aren't posting to the configured URL, you may enable DEBUG on the packages com.atlassian.webhooks and org.apache.http (ref.: Jira debug logging reference) and monitor the atlassian-jira.log for entries like these below:

2022-11-15 18:02:31,958+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [c.a.webhooks.plugin.PublishTaskFactoryImpl$PublishTaskImpl] Posting to web hook at 'https://external.service.com/api/something/some-endpoint?param1=qwertyuiop&param2=asdfghjkl&param3=zxcvbnm&user_id=username&user_key=JIRAUSER10000', body is:
    {"timestamp":1668535349200,"webhookEvent":"jira:issue_updated","issue_event_type_name":"issue_commented", ...
    ...
    ... REDACTED BODY CONTENT: this is the whole Issue JSON representation and can be quite lengthy.
    ...
    "created":"2022-11-15T18:02:29.070+0000","updated":"2022-11-15T18:02:29.070+0000"}}

2022-11-15 18:02:31,982+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [o.a.h.i.client.cache.CacheInvalidator] Invalidating parent cache entry: null
2022-11-15 18:02:31,982+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [o.a.h.i.client.cache.CachingHttpAsyncClient] Request is not servable from cache
2022-11-15 18:02:31,988+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [o.a.h.i.nio.client.MainClientExec] [exchange: 1] start execution
2022-11-15 18:02:31,990+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [o.a.h.client.protocol.RequestAddCookies] CookieSpec selected: ignoreCookies
2022-11-15 18:02:31,992+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [o.a.h.client.protocol.RequestAuthCache] Auth cache not set in the context
2022-11-15 18:02:31,992+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [o.a.h.i.nio.client.InternalHttpAsyncClient] [exchange: 1] Request connection for {s}->https://external.service.com:443

2022-11-15 18:02:32,640+0000 pool-29-thread-1 DEBUG anonymous     [o.a.h.impl.conn.PoolingHttpClientConnectionManager] Connection request: [route: {s}->https://uat-jira.yourcompany.com:443][total available: 3; route allocated: 1 of 5; total allocated: 3 of 10]
2022-11-15 18:02:32,641+0000 pool-29-thread-1 DEBUG anonymous     [o.apache.http.wire] http-outgoing-16111 << "[read] I/O error: Read timed out"
2022-11-15 18:02:32,641+0000 pool-29-thread-1 DEBUG anonymous     [o.a.h.impl.conn.PoolingHttpClientConnectionManager] Connection leased: [id: 16111][route: {s}->https://uat-jira.yourcompany.com:443][total available: 2; route allocated: 1 of 5; total allocated: 3 of 10]

2022-11-15 18:02:37,276+0000 httpclient-io:thread-1 DEBUG anonymous     [o.a.h.i.nio.client.InternalHttpAsyncClient] [exchange: 1] connection request failed

The line "Posting to web hook at 'https://external.service.com/api/..." shows the full URL and what's being sent on the body.

Revert the DEBUG level on org.apache.http as soon as possible to avoid log spamming. You may set it to WARN and Jira will automatically revert to the saved settings in it's log4j.properties upon the next restart.

You may then log into the Jira server (SSH or Remote Desktop) and manually post to the same URL using curl or another similar tool. It's important the request be made from the same Jira server, not your local machine, as maybe there are different Firewalls or proxy rules:

$ curl -k -D- -H "Content-Type: application/json" -X POST "https://external.service.com/api/something/some-endpoint?param1=qwertyuiop&param2=asdfghjkl&param3=zxcvbnm&user_id=username&user_key=JIRAUSER10000" -d '{"timestamp":1668535349200,"webhookEvent":"jira:issue_updated","issue_event_type_name":"issue_commented", ... REDACTED BODY CONTENT: this is the whole Issue JSON representation and can be quite lengthy... "created":"2022-11-15T18:02:29.070+0000","updated":"2022-11-15T18:02:29.070+0000"}}'

Performing this request outside of Jira (yet from the same machine) should help isolate if it's an issue with Jira's implementation or if the post to the destination URL is failing regardless.
Also, check on JIRA fails to post webhooks to external system due to network infrastructure blocking the request. for other causes, when no errors are being logged.

If this isn't enough to narrow down the cause, please refer to How to collect data to troubleshoot WebHook failure in Jira and reach out to Atlassian Support.

1.1 WebHooks not working occasionally

You may observe WebHooks failing only at times, but not consistently.

In this scenario, you can implement a program that will constantly call the WebHook URL (if that's possible) and record the output to a file. This must be executed from the same Jira server to mimic Jira's request as closely as possible:

$ for i in $(seq 4320); do curl -k -H "Authorization: Basic XYZXYZ" -X POST "external.service.com/api/something/some-endpoint?param1=qwertyuiop&param2=asdfghjkl&param3=zxcvbnm&user_id=username&user_key=JIRAUSER10000" -d '... REQUEST BODY (IF APPLICABLE) ...' -s -D- | tee -a webhook_curl_output.out; sleep 10; done; echo;
^C

This command above is a Linux curl example that performs a POST request every 10 seconds and runs for 12 hours (4320 * 10s). It can be interrupted anytime with Ctrl+C.

You may then check how many of each response codes were received:

$ egrep "^HTTP" webhook_curl_output.out | sort | uniq -c | sort -nr

54889 HTTP/1.1 201 
   10 HTTP/1.1 303 
    1 HTTP/1.1 502 Proxy Error

You can then "drill into" the failed request (the 502 in the example above):

$ egrep "^HTTP/1.1 502" webhook_curl_output.out -A15 

HTTP/1.1 502 Proxy Error
Date: Wed, 23 Nov 2022 22:25:16 GMT
Server: Apache/2.4.41 (Unix) OpenSSL/1.1.1k mod_jk/1.2.46
Content-Length: 341
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request<p>Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>

This helps isolate Jira out of the scenario and we can troubleshoot the destination service.

You may find out WebHook requests are failing in Jira because of:

  • Some external factors (infra, network or destination URL service availability or processing)
  • Specific data set (it consistently fails with certain data, on the URI or body, like special characters or a regex that triggers some proxy/firewall rule to the destination URL


2. WebHooks impact on user operations

There are two major impacts WebHooks can have on user operations (HTTP requests):

  • Filter (JQL) matching time
  • Issue data fetching time (from the database)

Refer to Best practices on working with WebHooks in Jira for advices on how making best use of WebHooks in Jira.

All commands here refer to the logs in the Support Zip structure: application-logs/atlassian-jira.log

When executing in your environment, either generate a support zip, unzip and run them or change the path to the <jira-home-folder>/log/atlassian-jira.log


2.1. JQL Matching

Each WebHook configured to the event being fired has it's Filter executed (JQL matching) to check if Jira should post data to the respective URL. This is done synchronously on the same Thread and can contribute considerably to response time if the JQL makes use of complex or expensive functions.

You can inspect the impact of WebHooks in user operations in Jira through the entries of "JqlEventMatcher_SLOW" in the atlassian-jira.log located in Jira's home log folder.

2.1.1. JQL matching impact on requests response times

This log parsing (Linux example) lists how much accumulated time each user request (Browser user or REST API client) was impacted:

$ grep -a -E 'JqlEventMatcher_SLOW' -h application-logs/atlassian-jira.log* | sed -E 's/ url:.* INFO / INFO /g' | awk 'BEGIN {req = "Undefined"}; {if ($3 !~ /http/ && $4 == "INFO") {req = $3"____-____"$5} else if ($5 == "INFO") {req = $3"____"$7"____"$6} else {req = $3"_"$9"____"$6"____"$5}}; {split($0, arr, "produced lucene query and took ") split(arr[2], arr2, "'\''")}; {print req" "arr2[2]}' | awk '{total_time[$1]+=$2; total_hits[$1]++}; END {for (i in total_time) {print total_hits[i], i, total_time[i]}}' | sort -nr -k 3 | sed 's/____/ /g' | awk 'BEGIN {print "Count", "Thread_name", "Request_Id", "Username", "Total_time_in_ms"}; ($2 ~ /http/) {print}' | column -t | head

出力サンプル

Count  Thread_name                                                Request_Id       Username      Total_time_in_ms
17     http-nio-8080-exec-26_/rest/api/2/issue/JIRA-123/comment   590x20060909x2   juser001      18379
18     http-nio-8080-exec-22_/rest/jpo/1.0/issues/commit          919x22418014x3   juser123      15822
18     http-nio-8080-exec-207_/rest/jpo/1.0/issues/commit         919x22418069x6   juser123      15777
18     http-nio-8080-exec-213_/secure/AjaxIssueAction.jspa        122x20546378x1   juser010      15534
15     http-nio-8080-exec-114_/secure/AjaxIssueAction.jspa        1338x19593764x1  juser010      15477
17     http-nio-8080-exec-214_/secure/QuickCreateIssue.jspa       919x22418016x4   juser044      15413
17     http-nio-8080-exec-58_/secure/QuickCreateIssue.jspa        288x21749072x2   juser100      15223

In the example, the request "590x20060909x2" by user "juser001" spend 18.3 seconds just running Filters/JQL of 17 WebHooks in total. This may even have resulted in no WebHooks actually matching — this was just Jira checking if any of the 17 WebHooks listening to that event matched the Issue being worked by the user.

2.1.2. JQL impact by Filter

This log parsing (Linux example) reports the JQL:

$ grep -E JqlEventMatcher_SLOW -h application-logs/atlassian-jira.log* | awk '{match($0, / '\''({|\().*(}|\)|[a-zA-Z])'\'' /); filter=substr($0, RSTART+2, RLENGTH-4); gsub(/'\''/,"", $(NF-3)); gsub(/ AND \{key in \("[0-9A-Z-]+"\)\}/, "", filter); total_count[filter]++; total_time[filter]+=$(NF-3)}; END {for (f in total_count) {printf "%d\t%d\t%.2f\t%s\n", total_count[f], total_time[f], (total_time[f]/total_count[f]), f}}' | sort -nr -k 3 | awk 'BEGIN {print "Count\tTotal_time_(ms)\tAvg_time_(ms)\tJQL"}; {print $0}' | column -t -s $'\t' | head

出力サンプル

Count  Total_time_(ms)  Avg_time_(ms)  JQL
1300   3620047          2784.65        {project = "JIRA"} AND {issuekey not in childissuesof(JIRA-2580)} AND {issuekey != "JIRA-2580"}
1307   2333499          1785.39        {project = "JIRA"} AND {issuekey in childIssuesOf(JIRA-2580)}
1670   2216867          1327.47        {project = "JIRA"} OR {project = "CONNIE"} AND {issuekey not in childissuesof(CONNIE-1346)} AND {issuekey not in ("CONNIE-1346")} AND {issuekey not in childissuesof(CONNIE-590)} AND {issuekey not in ("CONNIE-590")}
1908   1037890          543.97         {project in ("JIRA", "CONNIE", "INFRA", "FINANCE", "ACCT", "GOVT")} AND ( {issuekey in childissuesof(GOVT-177)} OR {issuekey in childissuesof(JIRA-184)} OR {issuekey in childissuesof(CONNIE-103)} OR {issuekey in childissuesof(INFRA-183)} OR {issuekey in childissuesof(FINANCE-182)} OR {issuekey in childissuesof(ACCT-170)} OR {issuekey in childissuesof(ACCT-154)} OR {issuekey in childissuesof(GOVT-168)} OR {issuekey in childissuesof(JIRA-169)} )
1509   673207           446.13         {project in ("JIRA", "CONNIE", "GOVT", "ACCT", "TECH2", "TECH3")} AND {issuekey in childIssuesOf(JIRA-1866)}

This output is the closest we can get to the actual WebHooks. We need to visually correlate the filters from the log parsing to what's configured in the database to get to which WebHook's delaying the most:

select "ID", "NAME", "ENCODED_EVENTS", "FILTER", "LAST_UPDATED_USER", "URL" from "AO_4AEACD_WEBHOOK_DAO" where "ENABLED" = true;
2.1.3. JQL impact by Issue

Specific Issues usually impact the data fetching (below) more than the JQL matching, but for the record, this log parsing (Linux example) reports on which Issues the JQL matching took most time:

$ grep -E "JqlEventMatcher_SLOW" -h application-logs/atlassian-jira.log* | awk '{split(substr($0, match($0, / AND \{key in \("[0-9A-Z-]+"\)\}/)), arr); gsub(/[(")}'\'']/,"",arr[4]); gsub(/'\''/,"", $(NF-3)); total_count[arr[4]]++; total_time[arr[4]]+=$(NF-3)}; END {for (i in total_time) {print total_count[i], i, total_time[i]}}' | sort -nr -k 1 | awk 'BEGIN {print "Count", "Issue_key", "Total_time_in_ms_(spanning_multiple_requests)"}; {print}' | column -t | head

出力サンプル

Count  Issue_key         Total_time_in_ms_(spanning_multiple_requests)
308    DYNMCS-4597       177896
235    ANT-887           149092
181    DYNMCS-4593       111803
161    DYNMCS-4402       104064
117    ANT-892           73201
106    DYNMCS-4598       64045
104    ANT-886           64575
100    DYNMCS-4596       62491
93     MSFT-3066         70526


2.2. Data fetching and JSON building

If WebHooks are configured to post Issue data, Jira will read all available data from the Issue from the database to build a JSON and post it in the request body to the URL. If the Issue has a lot of data (see How to find the Issues with most data in Jira), this can impact user operations response time.

We can confirm if WebHooks database reading is impacting user operations by capturing Thread dumps and looking for occurrences of "com.atlassian.jira.plugins.webhooks.serializer".

These commands will run for roughly 60 seconds and capture 30 Thread dumps (one every 2 secs):

Linux example with JDK's jstack
$ JIRA_PID=`ps aux | grep -i jira | grep -i java | grep -iv eazybi-child | awk  -F '[ ]*' '{print $2}'`;  for i in $(seq 30); do top -b -H -p $JIRA_PID -n 1 > jira_cpu_usage.`date +%s`.txt; jstack $JIRA_PID > jira_threads.`date +%s`.txt; printf "."; sleep 2; done; echo;
Linux example with kill -3 and catalina.out
$ JIRA_PID=`ps aux | grep -i jira | grep -i java | grep -iv eazybi-child | awk  -F '[ ]*' '{print $2}'`; for i in $(seq 30); do top -b -H -p $JIRA_PID -n 1 > jira_cpu_usage.$(date +%s).txt; kill -3 $JIRA_PID; printf "."; sleep 2; done; echo;

$ awk '/^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}$/{n++;td=1;lastLine=$0;outFile=("jira_threads." n ".txt")}; {if (td) {print $0 >> outFile; close(outFile)}}; /object space/{if (lastLine ~ /PSPermGen/) {td=0}}' <jira-install-path>/logs/catalina.out

For Windows, you can use the Performance Data Collector (see How to use the Performance Data Collector).

With the Thread dump files, we can parse and list which Threads were running for how many consecutive snapshots:

$ grep -E -h '^"http.*-exec-[0-9]+' jira_threads.* | sed 's/"//g' | sort -k1 | grep -E " runnable | Object.wait\(\) " | awk '{print $1}' | uniq -c | awk '($1 > 1) {print}' | sort -nr;

出力サンプル

   4 http-nio-8080-exec-744
   2 http-nio-8080-exec-716

Then, we can list only the ones seen executing the "com.atlassian.jira.plugins.webhooks.serializer" code:

$ grep -E -h '"http.*-exec-[0-9]+.*( runnable | Object.wait\(\) )' jira_threads.* -A150 | grep "com.atlassian.jira.plugins.webhooks.serializer" -B150 | grep -E '"http.*-exec-[0-9]+.*( runnable | Object.wait\(\) )' | sed 's/"//g' | awk '{print $1}' | sort | uniq -c | sort -nr;

出力サンプル

   3 http-nio-8080-exec-744
   2 http-nio-8080-exec-716

From the example above, Thread http-nio-8080-exec-744 was running on 4 snapshots (out of the 30 or how many we captured) and was serializing Issue data for WebHooks on 3 of them.


3. WebHook events discarded because of full queue (asynchronous)

If asynchronous WebHooks are enabled (either on purpose or in any of the versions listed at the top), the event queue may overflow and WebHooks won't be triggered for execution.

We may search the atlassian-jira.log for "pool size ... queued tasks ... WebHookEventsProcessor":

$ grep -E "pool size.*queued tasks.*WebHookEventsProcessor" -h atlassian-jira.log* | awk '{print $NF}' | sort | uniq -c | sort -nr | column -tx

出力サンプル

626  com.atlassian.crowd.event.user.UserAttributeStoredEvent
4    com.atlassian.jira.event.issue.IssueEvent
1    com.atlassian.jira.event.issue.link.IssueLinkCreatedEvent
1    com.atlassian.jira.event.comment.CommentUpdatedEvent
1    com.atlassian.jira.event.comment.CommentCreatedEvent

This means 4 Issue events (the last four lines from the output) were abandoned because the queue was full (likely due to a flood of UserAttributeStoredEvent, the first line from the output).

This issue's being tracked by JRASERVER-74492 - Getting issue details... STATUS

We may increase the queue size and pool size with the below startup parameters (See Setting properties and options on startup):

-Dwebhooks.executor.queue.size=200 -Dwebhooks.executor.thread.pool.size=10

(The above 200 and 10 are the default values)

These parameters are exclusive to each node and they must be restarted (one at a time's ok) for the JVM to consider them.





最終更新日: 2024 年 12 月 5 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.