275 lines
7.7 KiB
Markdown
275 lines
7.7 KiB
Markdown
# Leveraging `pg_stat_statements` for Tracking Metrics in PostgreSQL
|
||
|
||
## Introduction
|
||
|
||
PostgreSQL's `pg_stat_statements` extension is a powerful tool for monitoring and analyzing SQL query performance. While it doesn't directly track the number of rows affected by operations like `INSERT INTO SELECT`, it provides valuable insights into query execution, including **execution time, CPU usage, and the number of times a query is called**. This can be particularly useful for identifying performance bottlenecks and understanding query patterns.
|
||
|
||
In this guide, we'll explore how to use `pg_stat_statements` to track and analyze query metrics, and how to complement it with other techniques to capture row-level metrics.
|
||
|
||
---
|
||
|
||
## Why Use `pg_stat_statements`?
|
||
|
||
`pg_stat_statements` provides the following benefits:
|
||
|
||
1. **Query Performance Insights**: Track execution time, CPU usage, and memory consumption for each query.
|
||
2. **Query Frequency**: Identify frequently executed queries and their impact on the database.
|
||
3. **Optimization Opportunities**: Pinpoint slow queries that need optimization.
|
||
4. **Non-Intrusive**: No need to modify queries or add triggers.
|
||
5. **Lightweight**: Minimal performance overhead compared to other auditing methods.
|
||
|
||
---
|
||
|
||
## Step 1: Install and Enable `pg_stat_statements`
|
||
|
||
### Install the Extension
|
||
|
||
The `pg_stat_statements` extension is included in the PostgreSQL contrib package. To install it:
|
||
|
||
#### For Debian/Ubuntu:
|
||
```bash
|
||
sudo apt-get install postgresql-contrib
|
||
```
|
||
|
||
#### For RHEL/CentOS:
|
||
```bash
|
||
sudo yum install postgresql-contrib
|
||
```
|
||
|
||
### Enable the Extension
|
||
|
||
After installing, enable the extension in PostgreSQL:
|
||
|
||
1. Add `pg_stat_statements` to `shared_preload_libraries` in `postgresql.conf`:
|
||
```ini
|
||
shared_preload_libraries = 'pg_stat_statements'
|
||
```
|
||
|
||
2. Set the maximum number of statements to track:
|
||
```ini
|
||
pg_stat_statements.max = 10000
|
||
```
|
||
|
||
3. Set the level of tracking:
|
||
```ini
|
||
pg_stat_statements.track = all # Track all statements
|
||
```
|
||
|
||
Restart PostgreSQL to apply the changes:
|
||
```bash
|
||
sudo systemctl restart postgresql
|
||
```
|
||
|
||
Finally, create the extension in your database:
|
||
```sql
|
||
CREATE EXTENSION pg_stat_statements;
|
||
```
|
||
|
||
---
|
||
|
||
## Step 2: Query `pg_stat_statements` for Metrics
|
||
|
||
Once enabled, `pg_stat_statements` collects statistics on SQL queries. You can query the `pg_stat_statements` view to retrieve this information:
|
||
|
||
```sql
|
||
SELECT
|
||
query,
|
||
calls,
|
||
total_exec_time,
|
||
mean_exec_time,
|
||
rows,
|
||
shared_blks_hit,
|
||
shared_blks_read
|
||
FROM
|
||
pg_stat_statements
|
||
ORDER BY
|
||
total_exec_time DESC
|
||
LIMIT 10;
|
||
```
|
||
|
||
### Key Columns in `pg_stat_statements`:
|
||
- **`query`**: The normalized SQL query text.
|
||
- **`calls`**: The number of times the query was executed.
|
||
- **`total_exec_time`**: Total execution time in milliseconds.
|
||
- **`mean_exec_time`**: Average execution time in milliseconds.
|
||
- **`rows`**: Total number of rows retrieved or affected.
|
||
- **`shared_blks_hit`**: Number of shared buffer hits.
|
||
- **`shared_blks_read`**: Number of shared blocks read from disk.
|
||
|
||
---
|
||
|
||
## Step 3: Track Row-Level Metrics
|
||
|
||
While `pg_stat_statements` provides the total number of rows retrieved or affected by a query (`rows` column), it doesn't break this down by individual query execution. To capture row-level metrics for each execution, you can combine `pg_stat_statements` with other techniques:
|
||
|
||
### Option 1: Use Triggers (If Needed)
|
||
|
||
If you need to track row-level metrics for specific operations (e.g., `INSERT INTO SELECT`), you can use triggers as described in previous guides. However, this approach is more intrusive and may impact performance.
|
||
|
||
### Option 2: Parse PostgreSQL Logs
|
||
|
||
If you prefer a non-intrusive method, parse PostgreSQL logs to extract row-level metrics. Configure PostgreSQL to log detailed information:
|
||
|
||
```ini
|
||
log_statement = 'all'
|
||
log_duration = on
|
||
log_min_messages = INFO
|
||
```
|
||
|
||
Then, write a script to parse the logs and extract metrics like the number of rows affected by each operation.
|
||
|
||
---
|
||
|
||
## Step 4: Automate Metrics Collection
|
||
|
||
To keep track of query performance over time, automate the collection of metrics from `pg_stat_statements`. You can create a script to periodically capture and store these metrics in a separate table.
|
||
|
||
### Create a Table to Store Metrics
|
||
|
||
```sql
|
||
CREATE TABLE query_performance_metrics (
|
||
id SERIAL PRIMARY KEY,
|
||
capture_time TIMESTAMP DEFAULT NOW(),
|
||
query TEXT,
|
||
calls INT,
|
||
total_exec_time FLOAT,
|
||
mean_exec_time FLOAT,
|
||
rows BIGINT,
|
||
shared_blks_hit BIGINT,
|
||
shared_blks_read BIGINT
|
||
);
|
||
```
|
||
|
||
### Write a Script to Capture Metrics
|
||
|
||
Here’s a Python script to capture and store metrics from `pg_stat_statements`:
|
||
|
||
```python
|
||
import psycopg2
|
||
|
||
def capture_metrics():
|
||
conn = psycopg2.connect(
|
||
dbname="your_database",
|
||
user="your_user",
|
||
password="your_password",
|
||
host="your_host"
|
||
)
|
||
cursor = conn.cursor()
|
||
|
||
# Fetch metrics from pg_stat_statements
|
||
cursor.execute("""
|
||
SELECT
|
||
query,
|
||
calls,
|
||
total_exec_time,
|
||
mean_exec_time,
|
||
rows,
|
||
shared_blks_hit,
|
||
shared_blks_read
|
||
FROM
|
||
pg_stat_statements
|
||
""")
|
||
|
||
metrics = cursor.fetchall()
|
||
|
||
# Store metrics in query_performance_metrics
|
||
for metric in metrics:
|
||
cursor.execute(
|
||
"""
|
||
INSERT INTO query_performance_metrics (
|
||
query, calls, total_exec_time, mean_exec_time, rows, shared_blks_hit, shared_blks_read
|
||
)
|
||
VALUES (%s, %s, %s, %s, %s, %s, %s)
|
||
""",
|
||
metric
|
||
)
|
||
|
||
conn.commit()
|
||
cursor.close()
|
||
conn.close()
|
||
|
||
if __name__ == "__main__":
|
||
capture_metrics()
|
||
```
|
||
|
||
### Schedule the Script
|
||
|
||
Use `cron` to run the script periodically:
|
||
|
||
```bash
|
||
crontab -e
|
||
```
|
||
|
||
Add a line to run the script every hour:
|
||
```
|
||
0 * * * * /usr/bin/python3 /path/to/your/script.py
|
||
```
|
||
|
||
---
|
||
|
||
## Step 5: Analyze Metrics
|
||
|
||
You can now analyze the collected metrics to gain insights into query performance and usage patterns:
|
||
|
||
### Example Queries
|
||
|
||
1. **Top 10 Slowest Queries by Total Execution Time**:
|
||
```sql
|
||
SELECT
|
||
query,
|
||
total_exec_time,
|
||
calls,
|
||
mean_exec_time
|
||
FROM
|
||
query_performance_metrics
|
||
ORDER BY
|
||
total_exec_time DESC
|
||
LIMIT 10;
|
||
```
|
||
|
||
2. **Queries with the Highest Row Impact**:
|
||
```sql
|
||
SELECT
|
||
query,
|
||
rows,
|
||
calls
|
||
FROM
|
||
query_performance_metrics
|
||
ORDER BY
|
||
rows DESC
|
||
LIMIT 10;
|
||
```
|
||
|
||
3. **Trend Analysis Over Time**:
|
||
```sql
|
||
SELECT
|
||
DATE(capture_time) AS day,
|
||
SUM(total_exec_time) AS total_exec_time,
|
||
SUM(rows) AS total_rows
|
||
FROM
|
||
query_performance_metrics
|
||
GROUP BY
|
||
DATE(capture_time)
|
||
ORDER BY
|
||
day;
|
||
```
|
||
|
||
---
|
||
|
||
## Challenges and Considerations
|
||
|
||
1. **Performance Overhead**: While `pg_stat_statements` has minimal overhead, it can still impact performance on heavily loaded systems. Monitor your database performance after enabling it.
|
||
|
||
2. **Log Volume**: If you also parse PostgreSQL logs, ensure you have enough storage and a log rotation strategy.
|
||
|
||
3. **Query Normalization**: `pg_stat_statements` normalizes queries, which means it groups similar queries together. This can make it harder to track specific instances of a query.
|
||
|
||
4. **Security**: Ensure that sensitive information is not exposed in the logged queries.
|
||
|
||
---
|
||
|
||
## Conclusion
|
||
|
||
`pg_stat_statements` is a powerful tool for tracking and analyzing query performance in PostgreSQL. While it doesn't provide row-level metrics for each query execution, it offers valuable insights into query execution time, frequency, and row impact. By combining `pg_stat_statements` with other techniques like log parsing or triggers, you can build a comprehensive monitoring and auditing system for your PostgreSQL database.
|
||
|
||
Start leveraging `pg_stat_statements` today to optimize your database performance and gain deeper insights into your query workloads! |