How to monitor long running Queries in Confluence using PgAdmin
プラットフォームについて: Server および Data Center のみ。この記事は、Server および Data Center プラットフォームのアトラシアン製品にのみ適用されます。
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Fisheye および Crucible は除く
要約
This document is created to help find and analyse Long running queries in PostgreSQL DB and also to enable Slow query logging for more detailed troubleshooting
This documentation is for informational purposes. We would advise to use the guidelines in a test environment first. Atlassian does not guarantee any Support in case any issue arises after following the steps from this documentation.
What are Long running queries?
From time to time we need to investigate if there is any query running indefinitely on our PostgreSQL database. These queries can run for long time and may block other processes and stall our application. Also, these long running queries may interfere on the overall database performance
How to list long-running and slow queries on PostgreSQL?
Poorly written queries can cause very long execution times on your database. Typically discovered through slow response or extended increases in database CPU, the pg_stat_activity view can help to find out what query is causing issues. The pg_stat_activity view contains details of all currently running queries, including user, connection, and timing details.
A simple select * from pg_stat_activity will provide a snapshot of what is happening on your PostgreSQL database, with one line per current transaction, and the key columns:
datname: The database name that the query is running on.
pid: The Process ID of the backend for the running query. PID can be used with pg_terminate_backend() or pg_cancel_backend() separately.
username: user name that is running/connected to this backend
client_addr: IP address of the connection to this backend. If this is empty/null, the row may refer to an internal connection or internal process.
backend_start: Time when the backend process was started, or when the client connected.
query_start: Time when the currently active query was started.
state: The current status of the backend, with options:
active: currently executing the query in the query column.
idle: not executing anything, and waiting for a new command.
idle in transaction: the backend is in a transaction, but not currently doing any work. This is the same as with “idle in transaction (error)” except one of the statements has an error.
fastpath function call: Executing a fast-path function.
disabled: track_activities has been disabled for this backend.
wait_event & wait_event_type: The event type that the backend is waiting for. This can be an extensive list. Watch for locks on database objects here.
query: The text of the backends most recent, or currently active, query. As per the documentation, this is truncated at 1024 bytes, changeable with “track_activity_query_size” in the server configuration.
backend_type: There’s a few different types, but for most connections from external clients, this will be “client backend”.
SQL to find long running queries running more than 2 minutes on your Database.
SELECT
pid,
user,
pg_stat_activity.query_start,
now() - pg_stat_activity.query_start AS query_time,
query,
state,
wait_event_type,
wait_event, application_name
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '2 minutes';
How to find blocked processes and blocking queries?
The pg_blocking_pids() function is a useful shortcut to find the database connections / sessions that are blocking another session. The pg_blocking_pids() function returns an postgreSQL array of PIDs that are blocking the specified server process PID that you provide to the query. Typically, a server process blocks another if it holds a lock that is needed by the second process.
- Kill long-running PostgreSQL query processes?
Where some queries look like they’re not going to finish, you can use the pid (process ID) from the pg_stat_activity or pg_locks views to terminate the running process.
pg_cancel_backend(pid) will attempt to gracefully kill a running query process.
pg_terminate_backend(pid) will immediately kill the running query process, but potentially have side effects across additional queries running on your database server. The full connection may be reset when running pg_terminate_backend, so other running queries can be affected. Use as a last resort.
How To enable Slow Query Log in PostgreSQL
You can identify slow queries in PostgreSQL by simply modifying a few settings in postgresql.conf file. This can help us to set the logging temporarily so that we can log slow queries in a log file and check them.
Find location of postgresql.conf
You can find the location of the configuration file by running the below query.
SELECT * FROM pg_settings WHERE category IN( 'Reporting and Logging / Where to Log' , 'File Locations') and name='config_file' ORDER BY category, name;
To find all the setting related to Logging, please run the below query.
SELECT * FROM pg_settings WHERE category IN( 'Reporting and Logging / Where to Log' , 'File Locations') ORDER BY category, name;
Open postgresql.conf and modify the file to enable logging
Search for the following line
#log_min_duration_statement = -1
Uncomment it by removing # at its beginning. Also replace -1 with a query runtime threshold in milliseconds. For example, if you want to log queries that take more than 1 second to run, replace -1 with 1000
log_min_duration_statement = 1000
Also look for the following line,
logging_collector = on
Make sure logging_collector is set to on. Save the file.
Restart PostgreSQL database/ PgAdmin for the new settings to be applied.
Sample Test logs with Confluence.
For testing purpose, I had kept the log_min_duration_statement to 1 ms
2021-08-22 14:14:19.618 IST [15120] DETAIL: parameters: $1 = '0', $2 = 'system_maintenance', $3 = '2021-08-22 14:14:18.614', $4 = '20' 2021-08-22 14:14:27.079 IST [15107] LOG: duration: 2.051 ms bind <unnamed>: select cast(count(*) as int4) as col_0_0_ from CONTENT attachment0_ where attachment0_.CONTENTTYPE='ATTACHMENT' and attachment0_.PAGEID=$1 and (attachment0_.PREVVER is null) and attachment0_.CONTENT_STATUS='current' and attachment0_.LASTMODDATE>$2 2021-08-22 14:14:27.079 IST [15107] DETAIL: parameters: $1 = '1409029', $2 = '2021-08-22 14:14:09.587' 2021-08-22 14:14:29.425 IST [15120] LOG: duration: 1.005 ms bind <unnamed>: DELETE FROM scheduler_clustered_jobs where job_id = $1 2021-08-22 14:14:29.425 IST [15120] DETAIL: parameters: $1 = 'LuceneChangeIndexFlusher' 2021-08-22 14:14:29.426 IST [15107] LOG: duration: 1.031 ms bind <unnamed>: select "AO_7B47A5_EVENT"."ID", "AO_7B47A5_EVENT"."EVENT_AT" from "AO_7B47A5_EVENT" "AO_7B47A5_EVENT" order by "AO_7B47A5_EVENT"."ID" desc limit $1 2021-08-22 14:14:29.426 IST [15107] DETAIL: parameters: $1 = '1' 2021-08-22 14:14:29.430 IST [15121] LOG: duration: 1.804 ms parse <unnamed>: DELETE FROM scheduler_clustered_jobs where job_id = $1 2021-08-22 14:14:29.432 IST [15121] LOG: duration: 1.847 ms bind <unnamed>: DELETE FROM scheduler_clustered_jobs where job_id = $1