Troubleshooting WebHooks in Jira Data Center
プラットフォームについて: Server および Data Center のみ。この記事は、Server および Data Center プラットフォームのアトラシアン製品にのみ適用されます。
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Fisheye および Crucible は除く
要約
This article offers techniques you may use to troubleshoot WebHooks in Jira that you suspect are either not working or impacting general performance of user operations.
Please refer to Best practices on working with WebHooks in Jira for advices on how to make best usage of them.
The concept of WebHooks can also be implemented through Automation for Jira's Web Requests actions — please refer to Automate your project for more on this.
Quick overview on WebHooks mechanics in Jira
WebHooks in Jira are triggered by specific events but have global context — even if you specify a project or other conditions in the Filter/JQL.
Jira knows which WebHooks are listening to each event before hand, but doesn't know which will match the Issue being worked on. What it does is execute the JQL Filter of all WebHooks, one by one, and those that match the Issue being worked on are published to reach to the URL configured in them.
This having too many WebHooks or WebHooks with slow or heavy Filters may increase the response times of many operations in Jira.
Asynchronous WebHooks warning
Please note these versions of Jira were shipped with asynchronous WebHooks enabled by default!
8.13.21
8.13.22
8.13.23
8.13.24
8.13.25
and all others have asynchronous mode disabled by default again (as all versions from 8.14 to 8.22 and 9.x). If you upgrade from any of the affected versions above to 8.13.25+ or any other you don't have to do anything — upon each node restart WebHooks will behave synchronously again by default.
環境
Any version of Jira Core 7, 8 and 9.
Special attention to versions 8.13.21 to 8.13.24 (inclusively), as they have asynchronous WebHooks enabled by default.
ソリューション
1. WebHooks not working
If you suspect the WebHooks aren't posting to the configured URL, you may enable DEBUG on the packages com.atlassian.webhooks and org.apache.http (ref.: Jira debug logging reference) and monitor the atlassian-jira.log for entries like these below:
2022-11-15 18:02:31,958+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [c.a.webhooks.plugin.PublishTaskFactoryImpl$PublishTaskImpl] Posting to web hook at 'https://external.service.com/api/something/some-endpoint?param1=qwertyuiop¶m2=asdfghjkl¶m3=zxcvbnm&user_id=username&user_key=JIRAUSER10000', body is:
{"timestamp":1668535349200,"webhookEvent":"jira:issue_updated","issue_event_type_name":"issue_commented", ...
...
... REDACTED BODY CONTENT: this is the whole Issue JSON representation and can be quite lengthy.
...
"created":"2022-11-15T18:02:29.070+0000","updated":"2022-11-15T18:02:29.070+0000"}}
2022-11-15 18:02:31,982+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [o.a.h.i.client.cache.CacheInvalidator] Invalidating parent cache entry: null
2022-11-15 18:02:31,982+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [o.a.h.i.client.cache.CachingHttpAsyncClient] Request is not servable from cache
2022-11-15 18:02:31,988+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [o.a.h.i.nio.client.MainClientExec] [exchange: 1] start execution
2022-11-15 18:02:31,990+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [o.a.h.client.protocol.RequestAddCookies] CookieSpec selected: ignoreCookies
2022-11-15 18:02:31,992+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [o.a.h.client.protocol.RequestAuthCache] Auth cache not set in the context
2022-11-15 18:02:31,992+0000 Web-Hook-Publisher-0 DEBUG username 1082x238899x1 z6p1pq 104.192.142.9,18.160.46.8 /rest/api/2/issue/JIRA-123456/comment [o.a.h.i.nio.client.InternalHttpAsyncClient] [exchange: 1] Request connection for {s}->https://external.service.com:443
2022-11-15 18:02:32,640+0000 pool-29-thread-1 DEBUG anonymous [o.a.h.impl.conn.PoolingHttpClientConnectionManager] Connection request: [route: {s}->https://uat-jira.yourcompany.com:443][total available: 3; route allocated: 1 of 5; total allocated: 3 of 10]
2022-11-15 18:02:32,641+0000 pool-29-thread-1 DEBUG anonymous [o.apache.http.wire] http-outgoing-16111 << "[read] I/O error: Read timed out"
2022-11-15 18:02:32,641+0000 pool-29-thread-1 DEBUG anonymous [o.a.h.impl.conn.PoolingHttpClientConnectionManager] Connection leased: [id: 16111][route: {s}->https://uat-jira.yourcompany.com:443][total available: 2; route allocated: 1 of 5; total allocated: 3 of 10]
2022-11-15 18:02:37,276+0000 httpclient-io:thread-1 DEBUG anonymous [o.a.h.i.nio.client.InternalHttpAsyncClient] [exchange: 1] connection request failed
The line "Posting to web hook at 'https://external.service.com/api/..." shows the full URL and what's being sent on the body.
Revert the DEBUG level on org.apache.http as soon as possible to avoid log spamming. You may set it to WARN and Jira will automatically revert to the saved settings in it's log4j.properties upon the next restart.
You may then log into the Jira server (SSH or Remote Desktop) and manually post to the same URL using curl or another similar tool. It's important the request be made from the same Jira server, not your local machine, as maybe there are different Firewalls or proxy rules:
$ curl -k -D- -H "Content-Type: application/json" -X POST "https://external.service.com/api/something/some-endpoint?param1=qwertyuiop¶m2=asdfghjkl¶m3=zxcvbnm&user_id=username&user_key=JIRAUSER10000" -d '{"timestamp":1668535349200,"webhookEvent":"jira:issue_updated","issue_event_type_name":"issue_commented", ... REDACTED BODY CONTENT: this is the whole Issue JSON representation and can be quite lengthy... "created":"2022-11-15T18:02:29.070+0000","updated":"2022-11-15T18:02:29.070+0000"}}'
Performing this request outside of Jira (yet from the same machine) should help isolate if it's an issue with Jira's implementation or if the post to the destination URL is failing regardless.
Also, check on JIRA fails to post webhooks to external system due to network infrastructure blocking the request. for other causes, when no errors are being logged.
If this isn't enough to narrow down the cause, please refer to How to collect data to troubleshoot WebHook failure in Jira and reach out to Atlassian Support.
1.1 WebHooks not working occasionally
You may observe WebHooks failing only at times, but not consistently.
In this scenario, you can implement a program that will constantly call the WebHook URL (if that's possible) and record the output to a file. This must be executed from the same Jira server to mimic Jira's request as closely as possible:
$ for i in $(seq 4320); do curl -k -H "Authorization: Basic XYZXYZ" -X POST "external.service.com/api/something/some-endpoint?param1=qwertyuiop¶m2=asdfghjkl¶m3=zxcvbnm&user_id=username&user_key=JIRAUSER10000" -d '... REQUEST BODY (IF APPLICABLE) ...' -s -D- | tee -a webhook_curl_output.out; sleep 10; done; echo;
^C
This command above is a Linux curl example that performs a POST request every 10 seconds and runs for 12 hours (4320 * 10s). It can be interrupted anytime with Ctrl+C.
You may then check how many of each response codes were received:
$ egrep "^HTTP" webhook_curl_output.out | sort | uniq -c | sort -nr
54889 HTTP/1.1 201
10 HTTP/1.1 303
1 HTTP/1.1 502 Proxy Error
You can then "drill into" the failed request (the 502 in the example above):
$ egrep "^HTTP/1.1 502" webhook_curl_output.out -A15
HTTP/1.1 502 Proxy Error
Date: Wed, 23 Nov 2022 22:25:16 GMT
Server: Apache/2.4.41 (Unix) OpenSSL/1.1.1k mod_jk/1.2.46
Content-Length: 341
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request<p>Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
This helps isolate Jira out of the scenario and we can troubleshoot the destination service.
You may find out WebHook requests are failing in Jira because of:
- Some external factors (infra, network or destination URL service availability or processing)
- Specific data set (it consistently fails with certain data, on the URI or body, like special characters or a regex that triggers some proxy/firewall rule to the destination URL
2. WebHooks impact on user operations
There are two major impacts WebHooks can have on user operations (HTTP requests):
- Filter (JQL) matching time
- Issue data fetching time (from the database)
Refer to Best practices on working with WebHooks in Jira for advices on how making best use of WebHooks in Jira.
All commands here refer to the logs in the Support Zip structure: application-logs/atlassian-jira.log
When executing in your environment, either generate a support zip, unzip and run them or change the path to the <jira-home-folder>/log/atlassian-jira.log
2.1. JQL Matching
Each WebHook configured to the event being fired has it's Filter executed (JQL matching) to check if Jira should post data to the respective URL. This is done synchronously on the same Thread and can contribute considerably to response time if the JQL makes use of complex or expensive functions.
You can inspect the impact of WebHooks in user operations in Jira through the entries of "JqlEventMatcher_SLOW" in the atlassian-jira.log
located in Jira's home log
folder.
2.1.1. JQL matching impact on requests response times
This log parsing (Linux example) lists how much accumulated time each user request (Browser user or REST API client) was impacted:
$ grep -a -E 'JqlEventMatcher_SLOW' -h application-logs/atlassian-jira.log* | sed -E 's/ url:.* INFO / INFO /g' | awk 'BEGIN {req = "Undefined"}; {if ($3 !~ /http/ && $4 == "INFO") {req = $3"____-____"$5} else if ($5 == "INFO") {req = $3"____"$7"____"$6} else {req = $3"_"$9"____"$6"____"$5}}; {split($0, arr, "produced lucene query and took ") split(arr[2], arr2, "'\''")}; {print req" "arr2[2]}' | awk '{total_time[$1]+=$2; total_hits[$1]++}; END {for (i in total_time) {print total_hits[i], i, total_time[i]}}' | sort -nr -k 3 | sed 's/____/ /g' | awk 'BEGIN {print "Count", "Thread_name", "Request_Id", "Username", "Total_time_in_ms"}; ($2 ~ /http/) {print}' | column -t | head
出力サンプル
Count Thread_name Request_Id Username Total_time_in_ms
17 http-nio-8080-exec-26_/rest/api/2/issue/JIRA-123/comment 590x20060909x2 juser001 18379
18 http-nio-8080-exec-22_/rest/jpo/1.0/issues/commit 919x22418014x3 juser123 15822
18 http-nio-8080-exec-207_/rest/jpo/1.0/issues/commit 919x22418069x6 juser123 15777
18 http-nio-8080-exec-213_/secure/AjaxIssueAction.jspa 122x20546378x1 juser010 15534
15 http-nio-8080-exec-114_/secure/AjaxIssueAction.jspa 1338x19593764x1 juser010 15477
17 http-nio-8080-exec-214_/secure/QuickCreateIssue.jspa 919x22418016x4 juser044 15413
17 http-nio-8080-exec-58_/secure/QuickCreateIssue.jspa 288x21749072x2 juser100 15223
In the example, the request "590x20060909x2" by user "juser001" spend 18.3 seconds just running Filters/JQL of 17 WebHooks in total. This may even have resulted in no WebHooks actually matching — this was just Jira checking if any of the 17 WebHooks listening to that event matched the Issue being worked by the user.
2.1.2. JQL impact by Filter
This log parsing (Linux example) reports the JQL:
$ grep -E JqlEventMatcher_SLOW -h application-logs/atlassian-jira.log* | awk '{match($0, / '\''({|\().*(}|\)|[a-zA-Z])'\'' /); filter=substr($0, RSTART+2, RLENGTH-4); gsub(/'\''/,"", $(NF-3)); gsub(/ AND \{key in \("[0-9A-Z-]+"\)\}/, "", filter); total_count[filter]++; total_time[filter]+=$(NF-3)}; END {for (f in total_count) {printf "%d\t%d\t%.2f\t%s\n", total_count[f], total_time[f], (total_time[f]/total_count[f]), f}}' | sort -nr -k 3 | awk 'BEGIN {print "Count\tTotal_time_(ms)\tAvg_time_(ms)\tJQL"}; {print $0}' | column -t -s $'\t' | head
出力サンプル
Count Total_time_(ms) Avg_time_(ms) JQL
1300 3620047 2784.65 {project = "JIRA"} AND {issuekey not in childissuesof(JIRA-2580)} AND {issuekey != "JIRA-2580"}
1307 2333499 1785.39 {project = "JIRA"} AND {issuekey in childIssuesOf(JIRA-2580)}
1670 2216867 1327.47 {project = "JIRA"} OR {project = "CONNIE"} AND {issuekey not in childissuesof(CONNIE-1346)} AND {issuekey not in ("CONNIE-1346")} AND {issuekey not in childissuesof(CONNIE-590)} AND {issuekey not in ("CONNIE-590")}
1908 1037890 543.97 {project in ("JIRA", "CONNIE", "INFRA", "FINANCE", "ACCT", "GOVT")} AND ( {issuekey in childissuesof(GOVT-177)} OR {issuekey in childissuesof(JIRA-184)} OR {issuekey in childissuesof(CONNIE-103)} OR {issuekey in childissuesof(INFRA-183)} OR {issuekey in childissuesof(FINANCE-182)} OR {issuekey in childissuesof(ACCT-170)} OR {issuekey in childissuesof(ACCT-154)} OR {issuekey in childissuesof(GOVT-168)} OR {issuekey in childissuesof(JIRA-169)} )
1509 673207 446.13 {project in ("JIRA", "CONNIE", "GOVT", "ACCT", "TECH2", "TECH3")} AND {issuekey in childIssuesOf(JIRA-1866)}
This output is the closest we can get to the actual WebHooks. We need to visually correlate the filters from the log parsing to what's configured in the database to get to which WebHook's delaying the most:
select "ID", "NAME", "ENCODED_EVENTS", "FILTER", "LAST_UPDATED_USER", "URL" from "AO_4AEACD_WEBHOOK_DAO" where "ENABLED" = true;
2.1.3. JQL impact by Issue
Specific Issues usually impact the data fetching (below) more than the JQL matching, but for the record, this log parsing (Linux example) reports on which Issues the JQL matching took most time:
$ grep -E "JqlEventMatcher_SLOW" -h application-logs/atlassian-jira.log* | awk '{split(substr($0, match($0, / AND \{key in \("[0-9A-Z-]+"\)\}/)), arr); gsub(/[(")}'\'']/,"",arr[4]); gsub(/'\''/,"", $(NF-3)); total_count[arr[4]]++; total_time[arr[4]]+=$(NF-3)}; END {for (i in total_time) {print total_count[i], i, total_time[i]}}' | sort -nr -k 1 | awk 'BEGIN {print "Count", "Issue_key", "Total_time_in_ms_(spanning_multiple_requests)"}; {print}' | column -t | head
出力サンプル
Count Issue_key Total_time_in_ms_(spanning_multiple_requests)
308 DYNMCS-4597 177896
235 ANT-887 149092
181 DYNMCS-4593 111803
161 DYNMCS-4402 104064
117 ANT-892 73201
106 DYNMCS-4598 64045
104 ANT-886 64575
100 DYNMCS-4596 62491
93 MSFT-3066 70526
2.2. Data fetching and JSON building
If WebHooks are configured to post Issue data, Jira will read all available data from the Issue from the database to build a JSON and post it in the request body to the URL. If the Issue has a lot of data (see How to find the Issues with most data in Jira), this can impact user operations response time.
We can confirm if WebHooks database reading is impacting user operations by capturing Thread dumps and looking for occurrences of "com.atlassian.jira.plugins.webhooks.serializer".
These commands will run for roughly 60 seconds and capture 30 Thread dumps (one every 2 secs):
$ JIRA_PID=`ps aux | grep -i jira | grep -i java | grep -iv eazybi-child | awk -F '[ ]*' '{print $2}'`; for i in $(seq 30); do top -b -H -p $JIRA_PID -n 1 > jira_cpu_usage.`date +%s`.txt; jstack $JIRA_PID > jira_threads.`date +%s`.txt; printf "."; sleep 2; done; echo;
$ JIRA_PID=`ps aux | grep -i jira | grep -i java | grep -iv eazybi-child | awk -F '[ ]*' '{print $2}'`; for i in $(seq 30); do top -b -H -p $JIRA_PID -n 1 > jira_cpu_usage.$(date +%s).txt; kill -3 $JIRA_PID; printf "."; sleep 2; done; echo;
$ awk '/^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}$/{n++;td=1;lastLine=$0;outFile=("jira_threads." n ".txt")}; {if (td) {print $0 >> outFile; close(outFile)}}; /object space/{if (lastLine ~ /PSPermGen/) {td=0}}' <jira-install-path>/logs/catalina.out
For Windows, you can use the Performance Data Collector (see How to use the Performance Data Collector).
With the Thread dump files, we can parse and list which Threads were running for how many consecutive snapshots:
$ grep -E -h '^"http.*-exec-[0-9]+' jira_threads.* | sed 's/"//g' | sort -k1 | grep -E " runnable | Object.wait\(\) " | awk '{print $1}' | uniq -c | awk '($1 > 1) {print}' | sort -nr;
出力サンプル
4 http-nio-8080-exec-744
2 http-nio-8080-exec-716
Then, we can list only the ones seen executing the "com.atlassian.jira.plugins.webhooks.serializer" code:
$ grep -E -h '"http.*-exec-[0-9]+.*( runnable | Object.wait\(\) )' jira_threads.* -A150 | grep "com.atlassian.jira.plugins.webhooks.serializer" -B150 | grep -E '"http.*-exec-[0-9]+.*( runnable | Object.wait\(\) )' | sed 's/"//g' | awk '{print $1}' | sort | uniq -c | sort -nr;
出力サンプル
3 http-nio-8080-exec-744
2 http-nio-8080-exec-716
From the example above, Thread http-nio-8080-exec-744 was running on 4 snapshots (out of the 30 or how many we captured) and was serializing Issue data for WebHooks on 3 of them.
3. WebHook events discarded because of full queue (asynchronous)
If asynchronous WebHooks are enabled (either on purpose or in any of the versions listed at the top), the event queue may overflow and WebHooks won't be triggered for execution.
We may search the atlassian-jira.log for "pool size ... queued tasks ... WebHookEventsProcessor":
$ grep -E "pool size.*queued tasks.*WebHookEventsProcessor" -h atlassian-jira.log* | awk '{print $NF}' | sort | uniq -c | sort -nr | column -tx
出力サンプル
626 com.atlassian.crowd.event.user.UserAttributeStoredEvent
4 com.atlassian.jira.event.issue.IssueEvent
1 com.atlassian.jira.event.issue.link.IssueLinkCreatedEvent
1 com.atlassian.jira.event.comment.CommentUpdatedEvent
1 com.atlassian.jira.event.comment.CommentCreatedEvent
This means 4 Issue events (the last four lines from the output) were abandoned because the queue was full (likely due to a flood of UserAttributeStoredEvent, the first line from the output).
This issue's being tracked by JRASERVER-74492 - Getting issue details... STATUS
We may increase the queue size and pool size with the below startup parameters (See Setting properties and options on startup):
-Dwebhooks.executor.queue.size=200 -Dwebhooks.executor.thread.pool.size=10
(The above 200 and 10 are the default values)
These parameters are exclusive to each node and they must be restarted (one at a time's ok) for the JVM to consider them.