Monitoring OpenSearch

Monitoring is essential for maintaining the health and performance of your OpenSearch clusters, whether they're hosted on AWS or self-managed. Proactive monitoring helps you identify potential issues, such as resource constraints or cluster instability, before they impact your applications or users.

This guide provides practical recommendations for setting up monitoring and alerting for both AWS OpenSearch Service and self-hosted OpenSearch environments. You'll find here:

Commonly tracked metrics.
Approaches for configuring dashboards and alerts using AWS CloudWatch or open-source tools like Prometheus and Grafana.
Best practices for interpreting and responding to alerts.

On this page:

Monitoring approaches
- AWS-hosted OpenSearch
- Self-hosted OpenSearch
Key metrics and alerts
Troubleshooting and best practices
- Jira-specific metrics
- Enabling logging for debugging
その他のリソース

Monitoring approaches

Several approaches are available for monitoring OpenSearch clusters. The best choice depends on your deployment model and operational preferences.

AWS-hosted OpenSearch

For clusters managed with AWS OpenSearch Service, AWS CloudWatch is the primary monitoring solution. CloudWatch automatically collects a wide range of metrics from your OpenSearch domain, such as cluster health, resource utilization, and search performance. You can create dashboards and set up configurable alarms to stay informed about your cluster’s status. More about monitoring AWS-hosted OpenSearch Service

Self-hosted OpenSearch

For self-hosted OpenSearch clusters, open-source monitoring tools such. as Prometheus and Grafana, are commonly used. The OpenSearch Prometheus Exporter plugin collects cluster metrics, which you can visualize and analyze with Grafana dashboards. This approach offers flexibility and customization, allowing you to tailor monitoring to your needs. Prometheus also supports alerting rules, so you can receive proactive notifications based on custom thresholds. More about monitoring Self-hosted OpenSearch

Key metrics and alerts

Regardless of whether your OpenSearch cluster is hosted on AWS or self-managed, monitoring the following core metrics is important for maintaining stability, performance, and reliability.

カテゴリ	メトリック	Why it matters
Cluster health and availability	Cluster status (green, yellow, red): overall health and shard allocation	Detects cluster issues early
Cluster health and availability	Node availability: join and leave events	Identifies unexpected node changes
Resource utilization	Disk usage and free storage space	Prevents outages from full disks
	CPU 使用率	Highlights resource bottlenecks
	JVM memory pressure, heap usage	Prevents performance degradation
Performance metrics	Search latency and indexing latency: response time for search and indexing	Ensures fast user experience
Performance metrics	Thread pool queues: size of search/write queues	Identifies backlogs or slowdowns
Error rates and failures	5xx error rate frequency	Detects instability or misconfiguration
Error rates and failures	Automated snapshot failures, backup completion status	Ensures data protection
Jira-specific metrics	Point-in-time (PIT) contexts: usage of PIT searches	Important for Jira search reliability
Jira-specific metrics	Scroll contexts: usage of scroll APIs	Important for bulk data operations

Troubleshooting and best practices

For general OpenSearch monitoring, use AWS CloudWatch alarms to track cluster health, resource usage, performance, and error rates. Each alarm includes troubleshooting steps and best practices. Explore recommended CloudWatch alarms for Amazon OpenSearch Service

Jira-specific metrics

Point-in-Time (PIT) metrics

Alarms on CurrentPointInTime (number of open PIT contexts) or AvgPointInTimeAliveTime (average lifetime of PIT contexts) indicate that PIT searches aren’t closing promptly, or that the number of concurrent PIT contexts is approaching or exceeding cluster limits.

To address these alarms, you can:

Configure PIT Keepalive duration
Set the opensearch.pointintime.keepalive.seconds property in the jira-config.properties file to control how long a PIT remains active. Lowering this value can help ensure PIT contexts are closed sooner, minimizing resource usage. However, setting this value too low might result in failed search results, as PIT contexts could expire before queries complete. The default is 120 seconds, adjust it carefully based on your workload and monitoring data.
Monitor for unusual patterns
If you notice a sudden increase in open PIT contexts, check for recent changes in Jira usage, such as new plugins, integrations, or bulk operations that could generate excessive PIT searches.
Increase PIT Limits
If your workload requires more concurrent PIT contexts, raise the limit by updating the search.max_open_point_in_time_context node setting using the OpenSearch REST API:
```
PUT _cluster/settings
{
  "persistent": {
    "search.max_open_point_in_time_context": <desired_limit>
  }
}
```
Increasing this limit will use more resources. Monitor cluster health and resource usage after making changes.

Scroll Metrics

Alarms on ScrollCurrent (number of open scroll contexts) might indicate that scrolls aren't being cleaned up, leading to resource leaks and potential cluster instability.

To address these alarms, you can:

Check permissions
Make sure Jira has the required permissions to delete or clear scroll contexts. Without proper permissions, scrolls may accumulate and not be cleaned up.
Monitor usage patterns
If scroll usage remains high, check for your bulk operations or long-running queries. Consider optimizing or batching them differently to reduce the number of open scroll contexts.

Enabling logging for debugging

Enabling detailed logging can help with troubleshooting and performance analysis. OpenSearch provides several logging options to help you identify issues such as slow queries or indexing bottlenecks. Enable these logs temporarily during troubleshooting to minimize performance impact:

Request-level slow query logs: Capture queries that exceed a set execution time. Use these logs to find inefficient or problematic queries.
Shard-level slow indexing logs: Record indexing operations that are slower than expected at the shard level.
Shard-level slow search logs: Record search operations that are slow at the shard level.

For more information, check:

その他のリソース

Jira 向け OpenSearch を設定する
Identifying slow JQL queries

製品

Jira Software

Jira Service Management

Jira Work Management

Confluence

Bitbucket

リソース

関連ドキュメント

コミュニティ

システムステータス

提案とバグ

Marketplace

請求とライセンス情報

Monitoring OpenSearch

エンタープライズツール

このページの内容

このセクションの項目

お困りですか?

AWS-hosted OpenSearch

Self-hosted OpenSearch

Key metrics and alerts

Troubleshooting and best practices

Jira-specific metrics

Point-in-Time (PIT) metrics

Scroll Metrics

Enabling logging for debugging

その他のリソース

このセクションの項目

ページ

Viewport

Confluence

Monitoring OpenSearch

エンタープライズ ツール

このページの内容

このセクションの項目

関連コンテンツ

お困りですか?

AWS-hosted OpenSearch

Self-hosted OpenSearch

Key metrics and alerts

Troubleshooting and best practices

Jira-specific metrics

Point-in-Time (PIT) metrics

Scroll Metrics

Enabling logging for debugging

その他のリソース

このセクションの項目

関連コンテンツ

エンタープライズツール