Jira node holds mail handler cluster lock for a long time
Platform Notice: Data Center Only - This article only applies to Atlassian products on the data center platform.
The Jira admins keep getting cluster lock health check failures regarding one of the mail handler.
The health check failure reports:
Node '<node_id> ' has been holding cluster lock, 'com.atlassian.jira.service.services.mail.MailFetcherService.<mail_handler_id>', for <#_of_seconds> seconds.
Node 'node2 ' has been holding cluster lock, 'com.atlassian.jira.service.services.mail.MailFetcherService.11400', for 947 seconds.
- The cluster lock failure message reports the mail handler ID. The investigation should focus on this mail handler.
To find out what mail handler is holding the lock, query the DB as follows:
SELECT * FROM serviceconfig WHERE id = <id_from_the_cluster_lock_failure_message>;
- Enable incoming mail debug to understand why the mail handler is spending too much time processing incoming mail.
- Test the mail handler via Edit > Next > Test to find out how many emails are in the mailbox waiting to be processed.
A mail handler holding a cluster lock for a long time is not a problem per se; it may simply indicate that the mailbox has too much email to process and this naturally takes time.
The mailbox needs to be inspected and cleaned up for mass incoming email such as:
- Delivery notifications or failures
- Bulk email that isn't properly marked as bulk for Jira to ignore
- Old email lingering around
If there are no obvious problems with the mailbox, contact Atlassian Support. Be sure to include:
- a support zip
- a set of thread dumps captured from the node holding the cluster lock (while the lock is being held)