The Data Base

Outgrowing Postgres: Handling increased user concurrency

When your application grows, so too do your database connections. Learn how to handle increased user concurrency on Postgres.

Ariel Pérez

Jan 24, 2025 • 12 min read

As your application grows, you'll face a new challenge: managing increased user concurrency.

Postgres is known for its strength and reliability but, it wasn't designed to handle thousands of simultaneous connections. While vertical scaling helps with data volume, connection management needs a different approach.

Scaling Postgres for analytics?

You should give Tinybird a try. You can import your Postgres event tables, stream directly to Tinybird with HTTP, or CDC via Kafka, and use SQL to define real-time API endpoints with millisecond response latency even for billions of rows. You can sign up for free here.

Understanding the Postgres connection model and its limitations

Before I look at solutions, I want to examine how Postgres handles connections.

Postgres uses a process-based model - each client connection spawns a new server process. This keeps connections well-isolated but has some limitations:

Resource Use: Each new connection needs memory and CPU. More connections mean more resource use.
Connection Limit: The default max is typically 100 connections. You can increase this, but past a certain point, more connections hurt performance.
Context Switching: Many active connections force the OS to spend more time switching between processes, slowing everything down.

You can check your current settings with:

Before increasing max_connections, carefully evaluate your workload's actual per-connection memory usage. Each Postgres connection has some memory overhead, but the exact amount can vary significantly depending on your configuration and workload.

With the default settings, a new connection typically uses around 1-2 MB of memory for the backend process. Use tools like top to see the real memory used per backend PID. It’s important to regularly monitor this number because it can quickly increase as each connection does work.

3 ways to handle more users in Postgres

There are 3 basic ways that I know to handle increased user concurrency with Postgres:

Reducing the connection duration
Increasing connection capacity
Decreasing connection demand

These are the levers you have, but there are many ways to pull them. I'm going to go deep on each one.

Reduce connection duration

This first set of approaches aims to make queries faster so they hold connections for less time.

Query optimization, partitioning, and proper indexing

Optimizing queries, implementing table partitioning, and ensuring proper indexing are vital strategies to reduce query execution time, thereby minimizing the duration each connection is held open. I covered these techniques in more depth in my previous article, "Handling Growing Data Volumes in Postgres".

Caching within the database

Before you turn to app-level caching, you can consider some caching options within Postgres itself.

Materialized Views
Materialized views provide a way to cache query results within Postgres. They work by executing the query and storing the result set on disk. This pre-computation allows subsequent queries referencing the materialized view to retrieve the cached data directly, without re-executing the underlying query.

By caching the results, materialized views often reduce query times for read-heavy workloads and complex queries involving joins or aggregations. Postgres does the expensive work upfront and saves the results, enabling much faster lookups later on.

I covered how to create them in my "Handling Growing Data Volumes in Postgres" article.

Tuning configuration
The shared_buffers parameter determines how much memory Postgres dedicates to caching disk blocks. It's important to understand that shared_buffers caches raw disk blocks, not query results. The cached data still needs to be processed by the query executor. Proper tuning of this parameter improves query speed by reducing physical disk reads.

Contrary to popular advice, setting shared_buffers to 25% of total RAM is often excessive, especially on systems with a lot of memory. In most cases, a setting between 128MB and 4GB is sufficient. The Postgres documentation recommends 128MB for systems with 1GB or less RAM, and up to 4GB for systems with 64GB or more RAM.

To adjust shared_buffers, you can modify your postgresql.conf file:

Remember to restart your Postgres server after changing this setting.

Monitoring cache hit ratios can help you determine if your shared_buffers setting is effective:

A ratio above 0.99 (99%) suggests your cache is working well. However, even if the ratio is lower, increasing shared_buffers might not be the best or only configuration option. Before increasing shared_buffers, consider:

effective_cache_size: This parameter tells Postgres how much memory is available for disk caching, including OS cache. Set this to an estimate of how much memory is available in the OS and Postgres buffer caches, typically 50-75% of total RAM for dedicated DB servers.
work_mem: This determines how much memory is allocated for sort and hash operations before spilling to disk. Increasing work_mem can significantly speed up queries with large in-memory sorts or hashes. Be careful when setting this as this setting is per query.
OS Caches: Letting the OS cache Postgres data files can often be more effective than increasing shared_buffers. The OS cache is dynamic and adapts to the workload, caching frequently accessed data. By keeping more memory free for the OS cache instead of allocating it to shared_buffers, you can often achieve better performance. In other words, leaving enough RAM for the OS to manage its own filesystem cache effectively is highly recommended.

Remember, every workload is unique. Always test configuration changes and monitor system performance to find the best settings for your specific use case. Don't just blindly follow rules of thumb!

While this approach looks like a silver bullet, you can very easily get the configuration wrong. Proper tuning requires understanding your data, queries, and access patterns. Experiment with different settings, monitor key metrics like cache hit ratio and query performance, and iterate until you find the optimal configuration for your Postgres database.

Increase connection capacity

The second set of approaches focuses on increasing how much each connection can handle.

Connection pooling with PgBouncer

Connection pooling is usually the first and most effective fix to increase connection capacity. PgBouncer addresses two key scalability constraints in Postgres: per-connection memory overhead and context switching costs. Instead of Postgres maintaining separate processes and memory allocations, PgBouncer manages a small pool of actual Postgres connections that serve a larger number of client connections by assigning requests to available connections from the pool. This eliminates the overhead of repeatedly setting up and closing connections.

2,000 direct Postgres connections would usually need around 4-8 GB of memory, depending on the workload and configuration settings. PgBouncer can handle that same load with a much smaller number of pooled connections. For example, 100 pooled connections may only use around 200-400 MB of memory in PgBouncer itself. The actual Postgres backends would still use some memory for each of those 100 connections, but the total memory footprint is at least an order of magnitude smaller in comparison to 2,000 direct connections.

As always, there’s no free lunch. PgBouncer has important limitations to consider:

Single-threaded architecture: All traffic processes through a single thread, typically leading to increased latency between 2,000-5,000 client connections on modern hardware. Multiple PgBouncer instances can help but add operational complexity.
Queue management: When all pooled connections are occupied, new requests enter a wait queue. Long-running transactions can cause this queue to grow, potentially triggering client timeouts.
Feature constraints: Transaction pooling mode, while most efficient, doesn't support certain Postgres features like prepared statements and LISTEN/NOTIFY. Session pooling offers broader compatibility but reduced efficiency.

Here's a basic PgBouncer setup:

Install PgBouncer:

Configure PgBouncer

Start PgBouncer:

Now, instead of connecting directly to Postgres, your application would connect to PgBouncer (in this example, on port 6432).

PgBouncer can effectively handle most connection scaling needs up to 10,000+ clients. You’ll need to closely monitor queue depths and latency beyond several thousand connections. When these metrics indicate saturation, you’ll want to consider some of the more involved suggestions in this article.

Read replicas

Read replicas offload queries from your primary server by providing additional capacity for read operations through horizontal scaling. Each replica maintains an up-to-date copy of your data through streaming replication.

These are some of the things you’ll need to consider:

Capacity gains

Replica performance varies significantly based on workload complexity, hardware configuration, query patterns, and network and storage infrastructure.

A good rule of thumb is that typical read capacity ranges from 50-75% of primary server performance.
It’s recommended to have a maximum of 2-3 replicas before architectural redesign becomes necessary.
Effective scaling depends more on query optimization and infrastructure design than raw replica count.

Common bottlenecks

Replication lag: Writes on the primary must be replayed on every replica. Under heavy write loads, replicas can fall behind. The lag you might see is highly variable and depends on transaction volume, transaction complexity, network latency, and storage I/O performance but, the following are common:

- 5-10 second lag under normal loads.
- Lag can reach minutes or hours during peak writes or large batch operations.
- Queries on lagged replicas will return stale data.
Write amplification: Each replica repeats the primary's write operations. Write amplification will depend on write transaction size, replication method (streaming vs logical), network bandwidth, and storage capabilities. These are some of the issues you might run into:

- 3 replicas = 4x total write load on your system.
- It can overwhelm storage I/O, especially during peaks.
- It often requires faster storage (SSDs) on replicas than primary.
Connection management: Each replica needs its own connection pool.

- This multiplies connection management complexity.
- It requires application logic to handle replica failover.
- You must handle scenarios where replicas become unavailable.

Below is a sample configuration for setting up your replicas:

Configure streaming replication in postgresql.conf on the primary:

Set up the replica to connect to the primary in postgresql.conf:

In your application, implement logic to distribute read queries across your replicas.

Monitoring and management

As in all of the proposed approaches, monitoring is essential to understand the impact of your optimizations. Monitor the following very closely to understand if replication is working as intended:

Use pg_stat_replication to track:
- Replication lag
- Sent/received WAL
- Replay delay
Implement alerting for:
- Replication lag exceeding thresholds
- Replica connection issues
- WAL generation rate

Read replicas do a great job of distributing load across multiple servers, handling up to 80-90% of read traffic per replica in typical deployments but, they’re not a simple “add more servers” solution.

You'll need to monitor replication lag and I/O capacity closely as write volume grows. When these metrics show consistent degradation – lag exceeding seconds or I/O regularly at capacity – it might be time to consider some of the application-level approaches detailed further in this article.

Decrease connection demand

The third set of approaches focuses on needing fewer connections overall. These are much more invasive as they will require changes in your application logic and/or your system architecture and thus, a larger engineering effort.

Asynchronous processing

For operations that don't need immediate results, consider using asynchronous processing. By offloading time-consuming tasks to background workers, you can free up database connections more quickly.

This approach is particularly valuable when dealing with tasks such as processing large datasets, managing external API interactions, or performing complex database aggregations. By offloading these operations to background workers, applications can maintain performance, reduce system bottlenecks, and provide a smoother user experience, even with significant computational demands.

Additionally, this approach also provides other benefits: background workers can be scaled horizontally, independent of your main application while also giving you increased resilience against temporary system failures.

Asynchronous processing can be implemented using queueing technologies like RabbitMQ or Apache Kafka. For Python applications, a popular go-to is Celery, a distributed task queue that integrates well with various web frameworks.

Here's a simple example:

If you’re developing in Node, BullMQ has been rising in popularity as a go-to solution. For Rails applications, you can use ActiveJob with backends like Sidekiq for efficient background job processing.

In multi-threaded applications, implement connection sharing across threads instead of each thread opening its own connection. While both thread-local session management and PgBouncer aim to optimize database connection handling, they address different challenges in connection management.

Thread-local sessions provide a lightweight, application-side approach to reducing connection overhead within a single application instance. By reusing connections across threads, this method minimizes the number of simultaneous database connections and reduces connection establishment overhead.

In most environments, these approaches are complementary; thread-local connection management can be used alongside PgBouncer to further optimize connection efficiency with PgBouncer handling connection pooling between the application and the database, and thread-local management reducing internal connection overhead.

Here's a sample Typescript implementation using node-postgres, a popular collection of tools for working with Postgres in Node:

This approach ensures that each thread uses the same database connection, reducing the overall number of connections required.

Application-level caching

Another option to consider is application-level caching, leveraging logic within your application to serve duplicate requests within a small time frame from memory.

Server-side caching

A caching layer using popular in-memory databases like Redis or Memcached can go a long way in addressing Postgres connection overload issues by being able to handle a much larger concurrent request load. Adding a cache lets you serve frequent reads from memory instead, taking pressure off Postgres.

Moving read-heavy operations to cache reduces connection pool usage from thousands of concurrent database calls down to a manageable level, with response times dropping from milliseconds to microseconds.
For websites and apps with many users requesting the same data, a properly tuned cache can handle the bulk of read traffic while allowing your Postgres instance to focus on writes and uncached queries.
You can start small by caching your highest-volume queries first. This makes it simple to validate the approach before expanding further.
Furthermore, you can start even simpler by using in-memory caching approaches before swapping them out for a remote caching solution.

The following is a sample implementation in Go using Redis:

Caching in web applications

If you’re developing a web application, many popular frameworks make it easy to implement request and router caching, reducing the number of requests that hit your backend and subsequently, your Postgres instance(s). For NextJS developers, the Data Cache, Full Route Cache, and Router Cache are great options for caching across several layers of your application, abstracting away many of the underlying implementation details, allowing you to focus on your application logic.

While caching can significantly reduce database load, you need to be aware of potential drawbacks:

Data staleness: Cached data may become outdated if not properly invalidated or refreshed.
Cache consistency: Maintaining consistency between the cache and the database can be challenging, especially in distributed systems.
Added complexity: Implementing and managing a caching layer adds complexity to your application architecture.

Batching queries

Batching queries can reduce the number of database connections your application uses. The approach calls for redesigning your application's querying pattern to perform more operations in a single database call. This combines API design considerations with batch processing techniques to reduce the overall number of connections needed.

Here's an example using asyncio and aiopg for asynchronous database operations in a data analysis script in Python:

With query batching, you have to be careful not to overdo it. If you make your batches too large, each connection will stay open longer to process all those queued up queries. The goal is to find a balance - batch just enough to keep the number of connections manageable, but not so much that individual connections start hogging resources for too long. It might take some experimentation and monitoring to find the sweet spot for your particular workload.

Consider the following when implementing batch queries:

Batch size: Choose a batch size that reduces connections without making individual operations too long.
Query complexity: Be cautious when batching complex queries, as they might lead to longer-running transactions.
Consistency requirements: Ensure that batching doesn't compromise data consistency where individual, immediate updates are necessary.

When connection scaling isn't enough

So that's 2,614 words (not including code) on connection scaling. 😅

Despite all these optimizations, there will still come a point where Postgres struggles to keep up with your concurrency needs.

These are some signs that you're reaching the limits of what Postgres can handle:

Connection timeouts: If you're seeing frequent connection timeouts, it might indicate that your database can't handle the connection load.
High CPU usage: If your database server's CPU is consistently at or near 100% usage, it's a sign that it's struggling to keep up with the workload.
Increased query latency: If query response times are consistently increasing, even for simple queries, it might indicate that your database is overloaded.
Memory pressure: If your database server is frequently swapping or running out of memory, it's a sign that it can't handle the current load.
Replication lag: In a setup with read replicas, if the replicas are consistently falling behind the primary, it indicates that the write load is too high for a single primary to handle.

When you start seeing these signs, it's time to start considering other approaches:

Vertical Scaling: Upgrade to more powerful hardware. This can provide temporary relief but has limits (and costs).
Sharding: Distribute your data across multiple Postgres instances. This can greatly increase your capacity but adds significant complexity to your application. I covered sharding in my previous article on "Handling Growing Data Volumes in Postgres."
Tailored serverless Postgres solutions: If you’re B2B company with many customers of differing sizes, a virtual multi-tenant solution like Nile could be a great fit for you.
Postgres-compatible DSQL solutions: Distributed SQL databases that are either Postgres or Postgres-compatible like Citus, Aurora DSQL, and CockroachDB might be the right approach for handling your scale without forcing you to change much of your code. They each have their strengths and weaknesses and the right one for you depends on your use cases and needs.
NoSQL: For certain types of data and access patterns, a NoSQL database like MongoDB, ScyllaDB, or DynamoDB might be more suitable for high-concurrency scenarios, as long as your data makes more sense being denormalized.
Specialized databases: For analytical workloads, consider using a dedicated analytical database optimized for high-concurrency, low-latency queries on large datasets like Pinot, DuckDB, or ahem Tinybird.

The Bottom Line

Scaling Postgres to handle more concurrent users doesn't have to mean rebuilding your entire stack. Start with connection pooling - it's the lowest-hanging fruit and often provides immediate relief. From there, optimize your queries and implement smart caching strategies. Only then should you look at more complex solutions like read replicas, configuration tuning, and sharding.

Here's a quick checklist to tackle concurrency issues:

Set up PgBouncer for connection pooling.
Optimize slow queries eating up connections.
Add caching where it makes sense.
Move background tasks to async processing.
Consider read replicas for read-heavy workloads.

Don't forget to benchmark your performance as you go. Tools like pgbench can help measure the impact of your changes. You might be surprised how far these optimizations can take you - I've seen systems handle 10x more users just by implementing proper connection pooling and query optimization.

And if you do hit the limits? That's okay. It probably means your application is growing successfully. Just make sure you spot the warning signs early enough to plan your next move, whether that's sharding your data or moving some workloads to specialized databases.

Next in this series, I’ll discuss how to run OLAP workloads on Postgres performantly, because sometimes the real challenge isn't just handling more users, but handling what those users want to do with your data.

Need to move your analytics off Postgres?

Tinybird is data infrastructure for software teams. You can query your Postgres tables with SQL and publish queries as dynamic, scalable REST API Endpoints to power in-product reporting, real-time dashboards, and more.

You can try Tinybird for free with no time limit. Sign up here.