We are going to discuss the PostgreSQL configuration based on multiple areas. We will focus on the following areas:
- Data Loss Prevention
- Connectivity
- Resource Consumption
- Monitoring
We will divide each area into multiple sections focusing on a specific topic. We believe the values provided below will work well for you. However, feel free to disagree and discuss in comments what settings worked best for you.
You may be unable to change some parameters as they are controlled by your database provider. Contact your database administrator to discuss if it’s possible to modify them according to your needs.
Data Loss Prevention
Write-Ahead-Log (WAL) settings
Write-Ahead-Log (WAL) is a mechanism used by PostgreSQL to ensure the durability and consistency of data. Conceptually, it’s the crucial part for maintaining ACID properties. Technically, WAL is a sequence of files that the database uses to record all changes that need to be applied to the data. When a transaction is committed, the changes are first written to the WAL and then to the data files. This ensures that the data files are always consistent, even if the database crashes or loses power. WAL can also be used to restore the database to a previous point in time.
checkpoint_timeout - set to 15 minutes
This property sets how often PostgreSQL performs a checkpoint. During the checkpoint, PostgreSQL moves changes from WAL to the data files. The default value is 5 minutes.
It is recommended to start with a value of 15 minutes and monitor the frequency of timed checkpoints using the system table pg_stat_bgwriter.
See the documentation.
max_wal_size - set to a big value
This property regulates how big the WAL can grow before the commit happens. When the WAL size limit is reached, PostgreSQL needs to do an immediate checkpoint. PostgreSQL wants to complete the checkpoint as fast as possible, so it may be very resource-intensive. The default max_wal_size is 1024MB (1GB).
You should set max_wal_size to a big value so the immediate checkpoint doesn’t happen often. Data changes should be applied to the data files only during the scheduled checkpoint. You need to monitor the frequency of checkpoints with the system table pg_stat_bgwriter.
See the documentation.
full_page_writes - set to on
This parameter sets whether the entire content of each disk page is written to the WAL during the initial modification of that page after a checkpoint. Disabling this parameter enhances normal operation speed, but it also increases the risk of either unrecoverable data corruption or silent data corruption in the event of a system failure.
You should set the value to on. Otherwise, if a page write is interrupted by an operating system crash, it may result in an on-disk page that contains a mixture of old and new data. The row-level change data typically stored in the WAL alone is insufficient for a complete restoration of such a page during post-crash recovery.
See the documentation.
wal_level - set to replica
This parameter sets how much data is stored in the WAL.
See the documentation.
fsync - set to on
This parameter sets if the engine makes sure that the changes have been successfully saved to disk. Setting to Off can improve performance, as the server performs less consistency checks but it is strongly not recommended.
See the documentation.
synchronous_commit - set to on
This parameter sets how much work the engine does before confirming that the transaction has been committed. Set it to On. However, if the PG server uses replication ( synchronous_standby_names is non-empty) then consider using remote_apply to ensure data consistency between the source and destination. It is a bit slower as the WAL waits for the destination to process the transaction before returning “success” to the caller.
See the documentation.
Backups
Depending on your hosting scenario, you may have backups configured automatically (if you host in cloud) or you may need to maintain that on your own (if you go on-premise). Mind that cheap cloud tiers may not provide good backup strategies. Understand if the Point In Time Restore (PITR) is available for you, or if you can restore only to the last backup.
Configuration:
- PITR should be available if possible
- Backups should be stored in another region
- Backups should be encrypted
- Backups should be tested every time they are taken. Consult your database administrator if backups are automatically restored on a remote server to verify the consistency, or if you need to write scripts to do that manually.
Connections
This section describes parameters for controlling how users can connect to the database.
listen_addresses - set to the network interfaces you control with firewall and/or VPN
See the documentation.
port - set to non-default value
The default is 5432. The security best practice is to change the default port.
See the documentation.
max_connections - set to a value based on the number of applications and users you expect to connect at the same time
This parameter determines the maximum number of concurrent connections to the database server. This is a hard limit, the server will reject new connections over this number. The default is 100.
Increasing the size of max_connections will stop the backend from getting errors about refused connections. However, as the number of connections goes up, it might take the server a long time to open them (500ms). Therefore beyond 200-300 connections it is recommended to use a collection pool, such as pg_bouncer, pg_pool or AWS RDS proxy.
See the documentation.
Resources
This section covers resources controlled by the database engine.
Memory
shared_buffers - set to 25%-40% of the memory of the server
Sets the amount of memory the database server uses for shared memory buffers. The default is typically 128MB. It is recommended to configure it to 25% to 40% of the memory of the server.
See the documentation.
work_mem - set to 4MB
Sets the base maximum amount of memory to be used by a query operation (such as a sort or hash table) before writing to temporary files. The default value is 4MB.
A complex query may have several sort or hash operations running in parallel. Each operation will be allowed to use as much memory as this value specifies before it starts to write data into temporary files.
Mind that several running sessions could be doing such operations concurrently. Therefore, the total memory used could be higher than the value of work_mem.
Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, result cache nodes, and hash-based processing of IN subqueries.
Hash-based operations are generally more sensitive to memory availability than equivalent sort-based operations. The memory available for hash tables is computed by multiplying work_mem by hash_mem_multiplier. This makes it possible for hash-based operations to use an amount of memory that exceeds the usual work_mem base amount.
Recommendations: since every new connection requires allocations of work_mem , it is recommended to keep the default value, which is enough for 500 buffers (of 8K each). However, if monitoring the queries shows a usage of more than 500 buffers, it is recommended to increase the number. Raise the number slowly, based on the actual number of buffers written to disk. And monitor the impact on overall server memory allocation.
See the documentation.
temp_buffers - set to 8MB
Sets the maximum amount of memory used for temporary buffers within each database session. These are session-local buffers used only for access to temporary tables. The default is 8MB. Unless temporary tables are heavily used, keep the default configuration.
See the documentation.
maintenance_work_mem - set to 64MB
The maintenance_work_mem parameter is a memory setting used for maintenance tasks. The default value is 64MB.
Setting a large value helps in tasks like VACUUM, RESTORE, CREATE INDEX, ADD FOREIGN KEY, and ALTER TABLE. Therefore, if the database needs to Vacuum the tables often, as rows are updated or deleted often, increasing the value can improve the vacuum compilation.
See the documentation.
IO
effective_io_concurrency - set to 2 for HDD, 200 for SSD, 300 for SAN
Sets the number of concurrent disk I/O operations that PostgreSQL expects can be executed simultaneously. Raising this value will increase the number of I/O operations that any individual PostgreSQL session attempts to initiate in parallel.
As a starting point: with plain HDDs try setting to 2, with SSDs go for 200, and if you have a potent SAN you can start with 300.
See the documentation.
random_page_cost - set to 4 for HDD, 1.1 for SSD
This property instructs the query planner how expensive it is to access a random page. The default is 4.
For SSD drives use the value of 1.1. That probably generate more tables scans but the overall performance should be better in those scenarios.
See the documentation.
Monitoring
This section covers details about monitoring settings.
Cloud SQL Insights
The main cloud vendors offer a database performance tuning and monitoring feature that helps you quickly assess the load on your database, and determine when and where to take action. It allows non-experts to detect performance problems with an easy-to-understand dashboard that visualizes database load. Enable these insights during the deployment.
- AWS - https://aws.amazon.com/rds/performance-insights/
- Azure - https://learn.microsoft.com/en-us/azure/postgresql/single-server/concepts-query-performance-insight
- GCS - https://cloud.google.com/sql/docs/postgres/using-query-insights
Slow Query Log
The PostgreSQL slow query log is a file that records all queries that take longer than a specified amount of time to execute. This can be helpful in identifying queries that are causing performance problems.
log_min_duration_statement - set to 5000
Causes the duration of each completed statement to be logged if the statement ran for at least the specified amount of time. The default is -1 which causes the server not to use the slow query log. It is recommended to configure the parameter to a high number, such as 5000 (log queries longer than 5 seconds) and adjust based on the actual command logged.
See the documentation.
log_statement - set to mod
Controls which SQL statements are logged. The default is none. It is recommended to use mod to log all the DDL statements and the data modification statements (INSERT, UPDATE, DELETE).
See the documentation.
application_name - set to some codename you recognize
The best practice is to configure the backend server to also send the application name (a string, up to 64 characters). That helps filtering the logged queries to focus on the relevant ones.
See the documentation.
log_autovacuum_min_duration - set to 10min
Causes each action executed by autovacuum to be logged if it ran for at least the specified amount of time. Setting this to zero logs all autovacuum actions. -1 disables logging autovacuum actions. The default is 10min. The best practice is to keep the default, unless you suspect an auto vacuum happens too often and you would like to investigate this issue.
See the documentation.
How can Metis help
Metis can show you the running configuration of your database:
All you need to do is to configure the metadata collector to connect to your database and analyze the settings. Metis can keep track of that and let you know when the configuration is less than optimal.
Metis can also indicate issues around particular queries. For instance, working memory:
Metis continuously track settings and can suggest improvements whenever Metis finds it.
Summary
Keeping your database configuration in a perfect shape takes time and understanding of the configuration. Always mind your specific environment, especially if you run in a custom configuration. Use the values described above as a great starting point, and then tune them according to your needs.