Second node fails to start due to read timed out accessing dbo.clusteredjob with MS SQL Server database

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

プラットフォームについて: Server および Data Center のみ。この記事は、Server および Data Center プラットフォームのアトラシアン製品にのみ適用されます。

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Fisheye および Crucible は除く

要約

Starting a second node fails with Read Timed Out Exception while executing the following SQL command:

DELETE FROM dbo.clusteredjob WHERE JOB_ID=? (Read timed out)

When only one node is running, the node works fine.

2020-11-26 10:17:18,652-0500 localhost-startStop-1 ERROR      [c.a.j.component.pico.ComponentManager] Error occurred while starting component 'com.atlassian.jira.web.action.issue.DefaultTemporaryWebAttachmentsMonitor'.
com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: Generic Entity Exception occurred in deleteByAnd (SQL Exception while executing the following:DELETE FROM dbo.clusteredjob WHERE JOB_ID=? (Read timed out))
	at com.atlassian.jira.ofbiz.DefaultOfBizDelegator.removeByAnd(DefaultOfBizDelegator.java:238)

診断


1) Database shows sleeping process and multiple suspended requests:

Some of them are related to the table dbo.clusteredjob:

(@P0 bigint,@P1 bigint,@P2 nvarchar(4000),@P3 bigint)UPDATE dbo.clusteredjob SET  VERSION = @P0 ,  NEXT_RUN = @P1  WHERE JOB_ID=@P2 AND VERSION=@P3      (suspended)                         

(@P0 bigint,@P1 bigint,@P2 nvarchar(4000),@P3 bigint)UPDATE dbo.clusteredjob SET  VERSION = @P0 ,  NEXT_RUN = @P1  WHERE JOB_ID=@P2 AND VERSION=@P3      (suspended)                            

(@P0 bigint,@P1 bigint,@P2 nvarchar(4000),@P3 bigint)UPDATE dbo.clusteredjob SET  VERSION = @P0 ,  NEXT_RUN = @P1  WHERE JOB_ID=@P2 AND VERSION=@P3      (suspended)                          

(@P0 nvarchar(4000))DELETE FROM dbo.clusteredjob WHERE JOB_ID=@P0       (sleeping)


2) Log files on both nodes (the already started one and the second failed node) shows multiple Read Timed Outs as the sample below:

2020-11-25 20:20:09,383-0500 Caesium-1-3 ServiceRunner    300151ms "UPDATE dbo.clusteredjob SET  VERSION = '11' ,  NEXT_RUN = '1606353369231'  WHERE JOB_ID='com.atlassian.jira.plugin.ext.bamboo.service.PlanStatusUpdateJob-job' AND VERSION='10'"
com.microsoft.sqlserver.jdbc.SQLServerException: Read timed out
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2887) [mssql-jdbc-7.2.1.jre8.jar:?]
[...]
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method) [?:1.8.0_265]


環境

Jira DC 

Microsoft SQL Server 2016


原因


Commit was not being executed at the Database.

ソリューション


1) Make sure the JDBC connection is not using autocommit=false. If it is using, remove it.

Sample for SQL Server: 

jdbc:sqlserver://SQLHostname:3341;databaseName=jira;domain=TDBFG;IntegratedSecurity=true;UseNTLMv2=true;AuthenticationScheme=JavaKerberos;autoCommit=false;socketTimeout=300000


2) Ensure the database is set to READ_COMMITTED_SNAPSHOT ON as per Connecting JIRA to SQL Server.

To verify the changes, use this query which should result in '1':

SELECT sd.is_read_committed_snapshot_on
FROM sys.databases AS sd
WHERE sd.[name] = '<database name>';

最終更新日: 2020 年 12 月 11 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.