# Leveraging `pg_stat_statements` for Tracking Metrics in PostgreSQL ## Introduction PostgreSQL's `pg_stat_statements` extension is a powerful tool for monitoring and analyzing SQL query performance. While it doesn't directly track the number of rows affected by operations like `INSERT INTO SELECT`, it provides valuable insights into query execution, including **execution time, CPU usage, and the number of times a query is called**. This can be particularly useful for identifying performance bottlenecks and understanding query patterns. In this guide, we'll explore how to use `pg_stat_statements` to track and analyze query metrics, and how to complement it with other techniques to capture row-level metrics. --- ## Why Use `pg_stat_statements`? `pg_stat_statements` provides the following benefits: 1. **Query Performance Insights**: Track execution time, CPU usage, and memory consumption for each query. 2. **Query Frequency**: Identify frequently executed queries and their impact on the database. 3. **Optimization Opportunities**: Pinpoint slow queries that need optimization. 4. **Non-Intrusive**: No need to modify queries or add triggers. 5. **Lightweight**: Minimal performance overhead compared to other auditing methods. --- ## Step 1: Install and Enable `pg_stat_statements` ### Install the Extension The `pg_stat_statements` extension is included in the PostgreSQL contrib package. To install it: #### For Debian/Ubuntu: ```bash sudo apt-get install postgresql-contrib ``` #### For RHEL/CentOS: ```bash sudo yum install postgresql-contrib ``` ### Enable the Extension After installing, enable the extension in PostgreSQL: 1. Add `pg_stat_statements` to `shared_preload_libraries` in `postgresql.conf`: ```ini shared_preload_libraries = 'pg_stat_statements' ``` 2. Set the maximum number of statements to track: ```ini pg_stat_statements.max = 10000 ``` 3. Set the level of tracking: ```ini pg_stat_statements.track = all # Track all statements ``` Restart PostgreSQL to apply the changes: ```bash sudo systemctl restart postgresql ``` Finally, create the extension in your database: ```sql CREATE EXTENSION pg_stat_statements; ``` --- ## Step 2: Query `pg_stat_statements` for Metrics Once enabled, `pg_stat_statements` collects statistics on SQL queries. You can query the `pg_stat_statements` view to retrieve this information: ```sql SELECT query, calls, total_exec_time, mean_exec_time, rows, shared_blks_hit, shared_blks_read FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10; ``` ### Key Columns in `pg_stat_statements`: - **`query`**: The normalized SQL query text. - **`calls`**: The number of times the query was executed. - **`total_exec_time`**: Total execution time in milliseconds. - **`mean_exec_time`**: Average execution time in milliseconds. - **`rows`**: Total number of rows retrieved or affected. - **`shared_blks_hit`**: Number of shared buffer hits. - **`shared_blks_read`**: Number of shared blocks read from disk. --- ## Step 3: Track Row-Level Metrics While `pg_stat_statements` provides the total number of rows retrieved or affected by a query (`rows` column), it doesn't break this down by individual query execution. To capture row-level metrics for each execution, you can combine `pg_stat_statements` with other techniques: ### Option 1: Use Triggers (If Needed) If you need to track row-level metrics for specific operations (e.g., `INSERT INTO SELECT`), you can use triggers as described in previous guides. However, this approach is more intrusive and may impact performance. ### Option 2: Parse PostgreSQL Logs If you prefer a non-intrusive method, parse PostgreSQL logs to extract row-level metrics. Configure PostgreSQL to log detailed information: ```ini log_statement = 'all' log_duration = on log_min_messages = INFO ``` Then, write a script to parse the logs and extract metrics like the number of rows affected by each operation. --- ## Step 4: Automate Metrics Collection To keep track of query performance over time, automate the collection of metrics from `pg_stat_statements`. You can create a script to periodically capture and store these metrics in a separate table. ### Create a Table to Store Metrics ```sql CREATE TABLE query_performance_metrics ( id SERIAL PRIMARY KEY, capture_time TIMESTAMP DEFAULT NOW(), query TEXT, calls INT, total_exec_time FLOAT, mean_exec_time FLOAT, rows BIGINT, shared_blks_hit BIGINT, shared_blks_read BIGINT ); ``` ### Write a Script to Capture Metrics Here’s a Python script to capture and store metrics from `pg_stat_statements`: ```python import psycopg2 def capture_metrics(): conn = psycopg2.connect( dbname="your_database", user="your_user", password="your_password", host="your_host" ) cursor = conn.cursor() # Fetch metrics from pg_stat_statements cursor.execute(""" SELECT query, calls, total_exec_time, mean_exec_time, rows, shared_blks_hit, shared_blks_read FROM pg_stat_statements """) metrics = cursor.fetchall() # Store metrics in query_performance_metrics for metric in metrics: cursor.execute( """ INSERT INTO query_performance_metrics ( query, calls, total_exec_time, mean_exec_time, rows, shared_blks_hit, shared_blks_read ) VALUES (%s, %s, %s, %s, %s, %s, %s) """, metric ) conn.commit() cursor.close() conn.close() if __name__ == "__main__": capture_metrics() ``` ### Schedule the Script Use `cron` to run the script periodically: ```bash crontab -e ``` Add a line to run the script every hour: ``` 0 * * * * /usr/bin/python3 /path/to/your/script.py ``` --- ## Step 5: Analyze Metrics You can now analyze the collected metrics to gain insights into query performance and usage patterns: ### Example Queries 1. **Top 10 Slowest Queries by Total Execution Time**: ```sql SELECT query, total_exec_time, calls, mean_exec_time FROM query_performance_metrics ORDER BY total_exec_time DESC LIMIT 10; ``` 2. **Queries with the Highest Row Impact**: ```sql SELECT query, rows, calls FROM query_performance_metrics ORDER BY rows DESC LIMIT 10; ``` 3. **Trend Analysis Over Time**: ```sql SELECT DATE(capture_time) AS day, SUM(total_exec_time) AS total_exec_time, SUM(rows) AS total_rows FROM query_performance_metrics GROUP BY DATE(capture_time) ORDER BY day; ``` --- ## Challenges and Considerations 1. **Performance Overhead**: While `pg_stat_statements` has minimal overhead, it can still impact performance on heavily loaded systems. Monitor your database performance after enabling it. 2. **Log Volume**: If you also parse PostgreSQL logs, ensure you have enough storage and a log rotation strategy. 3. **Query Normalization**: `pg_stat_statements` normalizes queries, which means it groups similar queries together. This can make it harder to track specific instances of a query. 4. **Security**: Ensure that sensitive information is not exposed in the logged queries. --- ## Conclusion `pg_stat_statements` is a powerful tool for tracking and analyzing query performance in PostgreSQL. While it doesn't provide row-level metrics for each query execution, it offers valuable insights into query execution time, frequency, and row impact. By combining `pg_stat_statements` with other techniques like log parsing or triggers, you can build a comprehensive monitoring and auditing system for your PostgreSQL database. Start leveraging `pg_stat_statements` today to optimize your database performance and gain deeper insights into your query workloads!