Confluence throws 500 error due to lock timeout in VCacheLock

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問


この記事はアトラシアンのサーバー プラットフォームにのみ適用されます。クラウドとサーバー製品の違いについてはこちらをご確認ください。

問題

Sometimes Confluence may respond with 500x error pages, have a general performance issue, or even a full outage, due to the lock timeout in VCacheLock which is used by I18NBean. 

Either of the following appears in the atlassian-confluence.log:

WARN [http-nio-8090-exec-143] [internal.core.service.VCacheLock] lockWithTimeout Timed out waiting for lock on cache: com.atlassian.confluence.util.i18n.I18NBeanFactory.by.locale

または

ERROR [http-nio-8090-exec-1] [xhtml.view.macro.ViewMacroMarshaller] handleMacroExecutionException Error rendering macro: content-column
 -- url: /display/SPACEKEY/Page+Name | page: 123456789 | traceId: 7a94e4c4b2be88a0 | userName: user.name@example.com | referer: https://confluence.example.com/display/SPACEKEY/Page+Name | action: viewpage
java.lang.StackOverflowError
        at java.security.AccessController.doPrivileged(Native Method)
        at java.io.PrintWriter.<init>(PrintWriter.java:116)
        at java.io.PrintWriter.<init>(PrintWriter.java:100)
        at com.atlassian.confluence.util.PatternLayoutWithStackTrace.appendStackTrace(PatternLayoutWithStackTrace.java:43)
        at com.atlassian.confluence.util.PatternLayoutWithStackTrace.format(PatternLayoutWithStackTrace.java:30)
        at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
        at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276)
        at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
        at com.atlassian.confluence.logging.ConfluenceHomeLogAppender.append(ConfluenceHomeLogAppender.java:147)
        at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
        at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
        at org.apache.log4j.Category.callAppenders(Category.java:206)
        at org.apache.log4j.Category.forcedLog(Category.java:391)
        at org.apache.log4j.Category.error(Category.java:322)
        at com.comalatech.confluence.workflow.DefaultWorkflowAccessor.getWorkflows(DefaultWorkflowAccessor.java:225)
        at com.comalatech.confluence.workflow.DefaultWorkflowAccessor.getWorkflows(DefaultWorkflowAccessor.java:303)
        at com.comalatech.confluence.workflow.DefaultWorkflowAccessor.getPageWorkflows(DefaultWorkflowAccessor.java:290)
        at com.comalatech.confluence.ipc.api.DefaultStateService.getCurrentState(DefaultStateService.java:118)
        at com.comalatech.confluence.ipc.api.DefaultStateService.getCurrentState(DefaultStateService.java:53)
        at com.brikit.comalaworkflowsservice.Comala.getPublishedVersion(Comala.java:74)
        at com.brikit.core.confluence.Confluence.getPublishedComalaWorkflowsVersion(Confluence.java:3137)
        at com.brikit.core.confluence.Confluence.getVersionForUserRequest(Confluence.java:3906)
        at com.brikit.core.confluence.Confluence.getBodyAsString(Confluence.java:1717)
        at com.brikit.core.confluence.MacroParser.extractMacros(MacroParser.java:160)
        at com.brikit.core.confluence.MacroParser.extractMacros(MacroParser.java:151)
        at com.brikit.core.confluence.MacroParser.extractMacros(MacroParser.java:144)
        at com.brikit.themepress.model.PageWrapper.contentMacros(PageWrapper.java:623)
        at com.brikit.themepress.model.PageWrapper.layers(PageWrapper.java:1348)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:995)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:998)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:998)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:998)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:998)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:998)
        at com.brikit.themepress.model.PageWrapper.getLayerMacros(PageWrapper.java:998)
        .... Infinite recursion here


症状

The system becomes unresponsive. Thread dumps show HTTP threads hung waiting for the lock. Thread dumps do not show what threads owns the lock. After a heap dump investigation, it will be clear that lock owner is either dead or spending time in an unrelated piece of code.

The lock has been 'leaked' somehow. Threads are blocking on it, but it is never going to be unlocked. Once the system enters this state, it can only be healed by a restart.

診断

環境

診断ステップ

  • Capture Thread Dumps while the issue is occurring

  • First, check the status of all thread states across the thread dumps, using the following: 

    $ grep -A1 '-exec-' confluence_threads.* | grep State | sort | uniq -c


    • If the result looks like this, with all threads in a WAITING (parking) state, you may have this issue and should follow the second grep below: 

      $ grep -A1 '-exec-' confluence_threads.* | grep State | sort | uniq -c
       300 confluence_threads.1535641226.txt-   java.lang.Thread.State: WAITING (parking)
       300 confluence_threads.1535641241.txt-   java.lang.Thread.State: WAITING (parking)
       300 confluence_threads.1535641255.txt-   java.lang.Thread.State: WAITING (parking)
       300 confluence_threads.1535641270.txt-   java.lang.Thread.State: WAITING (parking)
       300 confluence_threads.1535641285.txt-   java.lang.Thread.State: WAITING (parking)
       300 confluence_threads.1535641301.txt-   java.lang.Thread.State: WAITING (parking)
  • Second, grep each thread dump using the following method to look for the lock: 

    $ cat thread_dump.txt  | egrep "(MemoryStore.put|MemoryStore.get)" -B 10


    • If the results match this example, you have likely run into this issue: 

      $ cat confluence_threads.1535641285.txt  | egrep "(MemoryStore.put|MemoryStore.get)" -B 1
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00007f1e048cd408> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
      	at net.sf.ehcache.store.chm.SelectableConcurrentHashMap$Segment.put(SelectableConcurrentHashMap.java:641)
      	at net.sf.ehcache.store.chm.SelectableConcurrentHashMap.put(SelectableConcurrentHashMap.java:383)
      	at net.sf.ehcache.store.MemoryStore.put(MemoryStore.java:283)
      --
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00007f1e048cd408> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
      	at net.sf.ehcache.store.chm.SelectableConcurrentHashMap$Segment.get(SelectableConcurrentHashMap.java:721)
      	at net.sf.ehcache.store.chm.SelectableConcurrentHashMap.get(SelectableConcurrentHashMap.java:324)
      	at net.sf.ehcache.store.MemoryStore.get(MemoryStore.java:344)
  • For full confirmation you have run into this issue, contact Atlassian Support

原因

This happens when thread-X and thread-Y both lock the cache, then other threads try to lock the cache also and must for 30 seconds, then an exception occurs.

Multiple third party vendors have been identified with plugins that can run into this issue. Below are some known vendors which have fixes to close this issue:

The Brikit Themepress plugin 1.x through 2.2.x uses code which causes this performance issue to occur.

CelesteCS Math Plugin for Confluence prior to version 21.1.0 can run into this issue.

回避策

Capture thread dumps and restart Confluence to recover the system. 

ソリューション

Brikit's bugs linked above note the issue should be resolved in Themepress 2.3.x.

Upgrading Themepress to the fixed version should prevent the issue from occurring again.



説明 Sometimes Confluence starts responding 500x error pages due to the lock timeout in VCacheLock which is used by I18NBean. It happens when threads X and Y lock this cache, and then other threads try to lock it also, wait for 30 seconds and then throw an exception.
製品Confluence Server, Confluence Data Center

最終更新日: 2019 年 2 月 25 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.