Confluence サポート

プラットフォームについて: Data Center - この記事は、Data Center プラットフォームのアトラシアン製品に適用されます。

このナレッジベース記事は製品の Data Center バージョン用に作成されています。Data Center 固有ではない機能の Data Center ナレッジベースは、製品のサーバーバージョンでも動作する可能性はありますが、テストは行われていません。サーバー*製品のサポートは 2024 年 2 月 15 日に終了しました。サーバー製品を利用している場合は、アトラシアンのサーバー製品のサポート終了のお知らせページにて移行オプションをご確認ください。

*Fisheye および Crucible は除く

This article applies to Confluence clustered 5.4 or earlier.

クラスタリング

Please familiarize yourself with the documentation:

A good table of contents can be found at Confluence Clustering Overview. Come back to this page if you need to find something related to confluence clustering.
Start with the Technical Overview of Clustering in Confluence. This holds most of what you need to know.
Consider the Cluster Checklist. This will give you a good idea of what you need to prepare, if you are serious about choosing clustered.
Lastly make sure you are familiar with Cluster Troubleshooting, which covers the most common scenarios.

Cluster Safety Check

Scheduled to run once every 30 seconds is the Cluster Safety Check. What it basically does is:

Fetch the cluster safety number in the database and compares it to the value it has cached.
If they are the same it generates a new number, caches it and updates the value in the database
Waits 30 secs and repeats step 1.

If the database number and the value it has cached is different, it will throw a cluster panic and effectively prevent users from accessing the instance.

The reason it does this is because it believes that some other instance (that is not part of its cluster) is accessing the database and updating the cluster safety number. If two instances are updating the database without each other knowledge, it is of course dangerous, as the caches in both instances will get out of sync with the database, and users can potentially override changes.

There are few well known situations that this could happen. See Confluence will not start due to fatal error in Confluence cluster:

Another instance (for example a test instance) has been started accidentally and it points to the same database as your production instance.
You have accidentally deployed the application twice (so that it starts up twice in the same application server). For example, you have referred to the Confluence in your sever.xml and a confluence.xml file within the tomcat configuration (thus starting up Confluence twice).
Your database is taking a long time to commit (e.g. it is doing a back up that lasts over 30 secs and freezes commits). In this case the cluster safety job sends a new value to the database, but when it checks next time in 30secs, the previous commit hasn't occurred so it still fetches the previous value which is out of sync with its cached value.
You are running a Confluence cluster and one of the nodes leaves the cluster due to problems communicating with the other nodes.

This last one is the least common, but the hardest to debug.

The cluster safety check should only run once per a cluster. Hence if one node runs the cluster safety job, the other nodes will not (due to job synchronisation). However if a node leaves the cluster, it no longer communicates with the other node, nor does it synchronise its jobs. Thus both the cluster and the node that left will run the cluster safety jobs independently and since they don't share the cache, their cached values and the value in the database will soon be out of sync triggering the cluster panic.

Nodes communicate via UDP using unicast (they first use multicast to discover a new node, once discovered they communicate via unicast). If there are network problems, then the nodes may experience communication delays:

2010-02-12 02:48:48,811 WARN [Logger@9247854 3.3.1/389] [Coherence] log 2010-02-12 02:48:48.811 Oracle Coherence GE 3.3.1/389 <Warning> (thread=PacketPublisher, member=1): Experienced a 13801 ms communication delay (probable remote GC) with Member(Id=2, Timestamp=2010-02-11 22:00:14.65, Address=160.254.3.89:8088, MachineId=23897, Location=process:21524@RTPCPAPWKI02); 88 packets rescheduled, PauseRate=0.0030, Threshold=387

and eventually timeout:

2010-02-12 03:03:58,705 WARN [Logger@9247854 3.3.1/389] [Coherence] log 2010-02-12 03:03:58.705 Oracle Coherence GE 3.3.1/389 <Warning> (thread=PacketPublisher, member=1): Timeout while delivering a packet; removing Member(Id=2, Timestamp=2010-02-11 22:00:14.65, Address=160.254.3.89:8088, MachineId=23897, Location=process:21524@RTPCPAPWKI02)
2010-02-12 03:03:58,734 INFO [Cluster:EventDispatcher] [confluence.cluster.coherence.TangosolClusterManager] memberLeft Member has left cluster: Member(Id=2, Timestamp=2010-02-12 03:03:58.705, Address=160.254.3.89:8088, MachineId=23897, Location=process:21524@RTPCPAPWKI02)

As you can see, a timeout is quickly followed by the removal of the node from the cluster (as it is assumed that the node is down).
The timeout value is by default 60 seconds in production installations. The heart beat, runs once every second and thus a node has to fail to respond to a heart beat for 60 secs in order for the node to determine if the node is dead. More details can be found in Coherence's packet delivery doc.

Once a node is removed from the cluster (but the node is still up), eventually (within 30secs) one of the instances (i.e. the remaining cluster/node or the node that left the cluster) will trigger a cluster panic. Once an instance triggers a cluster panic, it will update the database value to ensure that all other nodes will also cluster panic.
In confluence clusters there is no notion of master or slave node. Thus there is no way of knowing which of the nodes should be left running and thus to be safe all nodes are set to panic. It doesn't matter if only one node of 4 node cluster leaves, it will lead to the panic of all of the 4 nodes.

Job Synchronisation

Certain scheduled jobs will only run by one node in the cluster. Others will be run by all nodes. Examples of jobs that only run once per a cluster include:

Daily Report Mail
Cluster Safety Check
バックアップ

Examples of jobs that will run in every node:

Incremental Indexing
Index Optimisation

ナレッジベース記事

ページ:

マルチキャストトラフィック通信の問題に起因するクラスターパニック
ページ:

The Workbox notifications icon on upper-right is missing or not appearing
ページ:

Unable to Upgrade Confluence Clustered due to Clustered Edition Distribution
ページ:

Team Calendars for Confluence not able to load in Clustered Environment Confluence
ページ:

Clustered Confluence Remote API Session Expires After 30 Minutes
ページ:

Confluence ログファイルのクラスタ警告メッセージをサポートする方法
ページ:

'UnicastUdpSocket failed to set receive buffer size to 1428 packets' Error Due to Coherence Tuning
ページ:

Performance Problems due to Cluster Communication
ページ:

Unable to Start Node due to no Cluster License
ページ:

Cluster edition for Confluence 4.0 is not available.
ページ:

How to Revert from Clustering to Single Node
ページ:

How to Configure a Cluster Without Multicast Traffic
ページ:

Cluster panics (Non Clustered Confluence 2.10.4, 3.0.1 and 3.0.2)
ページ:

Cluster communication problems: Member has left cluster, or Member has been forcefully evicted from cluster, or A potential communication problem has been detected

Confluence サポート

使用を開始する

ナレッジベース

製品

Jira Software

Jira Service Management

Jira Work Management

Confluence

Bitbucket

リソース

関連ドキュメント

コミュニティ

システムステータス

提案とバグ

Marketplace

請求とライセンス情報

Confluence のクラスターの問題

お困りですか?

クラスタリング

Cluster Safety Check

Job Synchronisation

ナレッジベース記事

ページ

Viewport

Confluence

Confluence のクラスターの問題

関連コンテンツ

お困りですか?

クラスタリング

Cluster Safety Check

Job Synchronisation

ナレッジベース記事

関連コンテンツ