The Data Base

Outgrowing Postgres: How to evaluate the right OLAP solution for analytics

Moving analytical workloads off Postgres? Learn how to evaluate real-time OLAP solutions based on what actually matters: performance, SQL compatibility, and developer experience.

Ariel Pérez

Feb 17, 2025 • 13 min read

I've been writing about the journey of outgrowing Postgres, and if you've followed along, you're probably at the point where you need to move your analytical workloads to a dedicated OLAP solution. Trust me, I've been there - staring at a sea of options, each claiming to be the perfect fit.

In my previous article on identifying when to move OLAP workloads off Postgres, I covered the breaking points: query latency creeping up, resource contention between OLTP and OLAP workloads, and the growing headache of maintaining performant analytical queries. Now it's time to evaluate your options.

I'm going to focus specifically on real-time OLAP databases. Why not include traditional data warehouses like Snowflake, BigQuery, or Redshift? Been there, tried that. They're great at batch analytics and BI workloads, but they fall apart when you need thousands of concurrent queries with sub-second response times. And if you try to scale them for these use cases? Your finance team might have a heart attack.

I'm also skipping streaming analytics solutions like Apache Flink or Kafka Streams. Don't get me wrong - they're brilliant at processing real-time data streams within short time windows. But they weren't built for querying months or years of historical data alongside your real-time analytics. You need a solution that can do both.

In this article, I'll help you evaluate several OLAP databases that could be the right fit for your analytics workloads. I'm particularly focused on real-world requirements I've seen teams struggle with:

Handling your scale and performance requirements
Getting your data out of Postgres efficiently
Minimizing query rewrites and migration effort
Maintaining high development velocity with a small team
Keeping operational overhead manageable

We’ve previously done a comparison of real-time analytics databases. This time, I'm focusing specifically on your migration path from Postgres and including analysis of some newer players making waves in the space: StarRocks, DuckDB, and Doris.

Understanding your evaluation criteria

When I help teams evaluate OLAP solutions for their Postgres migration, I start with four key areas. Of course, you could spend months analyzing every feature under the sun, but I've found these areas capture what actually matters for most teams.

Performance & Scalability

First up is the obvious one - performance. But not just raw query speed. Here's what you need to look at:

How does it handle billions or even trillions of records? One thing that’s burned me before is testing with small datasets only to find performance fell off a cliff at scale.
What's the query latency like? More importantly, is it predictable? Nothing worse than a 95th percentile that's 100x your median.
Can it handle thousands of concurrent queries? If you're building user-facing analytics, this is non-negotiable.
How efficiently does it use resources? Some solutions will happily eat all your RAM and CPU - then ask for seconds, and thirds.

SQL Compatibility

The next thing I look at is SQL compatibility - and unfortunately, lots of teams get this wrong. They pick a solution based on benchmarks alone, only to spend months rewriting their queries.

What you really need to know is:

How much of your existing Postgres SQL will run unmodified? Every query you have to rewrite is time you're not building new features.
What about those complex analytical queries? Some solutions advertise "SQL support" but choke on window functions or complex JOINs.
Will your data types map cleanly? I’ve seen teams spend weeks dealing with a solution that couldn't properly handle timestamps with timezone - not fun.
How's the query optimizer? Sometimes it's not the SQL support that's the problem, but getting the query to run efficiently.

I've found this is where many solutions fall short. They look great on paper, but the effort to migrate your existing SQL can be massive. The best solution is often the one that lets you copy-paste your Postgres queries with minimal changes.

Developer Experience & Operations

This is the section too many teams overlook - and it's where I see the biggest differences between solutions. When you're a small team moving fast, the "time to value" and ongoing operational overhead can make or break your project, if not your company.

Let's break this one down:

Time to Value

How long does it take to go from zero to a working API? I'm not talking about "Hello World" - I mean ingesting real data and serving real queries. Some solutions can get you there in minutes. Others... well, I hope you like writing infrastructure code.

The real questions:

Can you start building immediately, or do you need to set up a bunch of infrastructure first?
What's the local development story? You need to be able to test and iterate quickly.
How do you go from SQL to a production API? If the answer involves multiple services and custom code, that's a red flag.

Development Velocity

This is the killer - how fast can you iterate? I look for:

How quickly can you tweak a schema or modify a query?
Can you test your changes end-to-end before deploying?
Does it integrate with your version control workflow?

Operational Simplicity

Here's where reality hits. Many solutions look great in demos but require a ton of infrastructure and moving parts:

Do you need separate streaming infrastructure?
What about layers of APIs?
Schema management tools?
Monitoring and observability?

The best solutions handle all of this out of the box. The worst? Well, hope you like maintaining multiple systems just to serve an analytics API.

Integration Capabilities

This is really about data movement and accessibility. Getting data in and out reliably - and exposing it usefully to your applications.

Let's start with the elephant in the room: Postgres integration. You're moving data from Postgres, so this matters a lot. You want options:

Can it act as a Postgres replica? This is often the cleanest way to get your data in.
What about CDC or WAL replication? Sometimes you need real-time changes.
Can it query Postgres directly? This can be handy for gradual migration or pulling in dimensional data.
Are there tools to help with the initial migration?

But it's not just about Postgres. You'll also need to handle:

Other real-time data sources (Kafka, Kinesis, HTTP streams)
Semi-structured data (JSON, anyone?)
Batch data ingestion
Getting data out via APIs or Sinks

The API story is crucial. Some solutions provide a basic query interface and call it a day. Others give you full-featured, production-ready REST or GraphQL APIs out of the box. The difference in engineering time can be massive.

Tool ecosystem matters too. Can it integrate with your existing BI tools? Monitoring stack? Development workflows?

Comparing Real-Time OLAP Solutions

Now that we understand what to look for, let's evaluate each solution against these criteria. I'll start with the established players that have proven themselves at scale, then look at newer solutions that are pushing the boundaries in interesting ways.

ClickHouse®

ClickHouse has earned its reputation as a powerhouse for analytical query performance. It combines columnar storage, distributed processing, and a focus on raw speed that's made it increasingly popular for real-time analytics.

Performance & Scalability

Excels at large-scale analytics - handles billions of rows with ease
Query latency measured in milliseconds for most queries
Takes expertise to tune for high concurrency

SQL Compatibility

Supports the most common SQL analytics features.
Some Postgres queries need rewriting due to syntax differences.
Excellent at aggregations, but effective JOIN performance can depend on precise query optimization techniques and/or clever query rewriting.
Supports advanced window functions.

Developer Experience & Operations

Significant operational overhead - managed solutions help in this regard.
Requires expertise to set up and maintain. Ditto on managed solutions.
Need additional tools for API layer and schema management. The HTTP Interface is a potential starting point. Managed solutions like Clickhouse Cloud’s Query API Endpoints and ClickPipes can reduce some of the burden here but, as is the case with OLAP solutions schema iteration can still be challenging and require extensive data operations.
chDB comes in handy for unit testing because it runs in-memory.
clickhouse-local can help with local development but, you still need additional migration management and testing tooling for validation and deployment.

Integration

Postgres CDC support (Private Preview) only available in Clickhouse Cloud’s ClickPipes.
Can directly query Postgres.
Strong ecosystem of connectors.
Their HTTP Interface enables sending of raw queries over HTTP. Clickhouse Cloud’s Query API Endpoints are a more complete API implementation.

Apache Druid

Druid has been the go-to solution for real-time analytics at major tech companies for years. It's built specifically for high-speed ingestion and aggregation at scale.

Performance & Scalability

Built for high concurrency on time-series data.
Optimized for sub-second query latency on time-series aggregations, though performance can vary with query complexity and data volume.
Excellent at filtering and grouping.
Resource-intensive - needs careful capacity planning.

SQL Compatibility

SQL support has improved but still limited
Many Postgres analytics queries will need significant rewriting
JOIN support has improved but, they can require careful optimization

Developer Experience & Operations

Complex architecture with multiple moving parts. Managed solutions like Imply can make this easier.
Steep learning curve for operations although improvements in documentation and tooling have eased this.
APIs need to be built to integrate it into your apps. The SQL API can potentially be an option.
Docker image and and local web UI make it easier to develop against and monitor locally. Additional testing and migration tooling are needed, however.
Schema changes require careful planning, especially for large datasets.

Integration

No native Postgres CDC/replication support.
Built-in real-time ingestion via Kafka and Kinesis.
Limited built-in API capabilities except for the SQL API.
Good integration with analytics and visualization tools.

Apache Pinot

Born at LinkedIn and now powering analytics at Uber and Stripe, Pinot was built from the ground up for user-facing analytics at scale.

Performance & Scalability

Optimized for high-concurrency scenarios.
Great query performance at scale.
Strong at time-series and last-value queries.
Built-in query optimization for common patterns.

SQL Compatibility

Standard SQL support is decent.
Limited support for complex JOINs although, it’s improving.
Window functions supported but, with limitations.
Will likely need to adapt many Postgres analytical queries.

Developer Experience & Operations

Complex multi-component architecture.
Requires Zookeeper, controllers, brokers, servers. Managed solutions like StarTree make this easier.
Setup and maintenance needs dedicated expertise although documentation and tooling improvements have made it more accessible.
Schema changes are non-trivial.
Docker image and local web UI make it easier to develop against it locally. Additional testing and migration tools are also needed.

Integration

No direct Postgres integration.
Solid Kafka integration.
Needs custom development for API layer although the Broker Query API is a potential option.
Good ecosystem for batch and streaming ingestion.

Tinybird

Tinybird takes a different approach to real-time analytics. Rather than just offering a managed database, it provides an end-to-end platform built specifically to accelerate small software teams building data-intensive analytics applications.

Performance & Scalability

Built for high-concurrency analytics at scale.
Sub-second queries over billions of rows.
Predictable performance under load.
Efficient resource utilization.
Handles thousands of concurrent API requests.

SQL Compatibility

Supports most common analytical SQL features from Postgres.
May require syntax adjustments for some queries, particularly around specific functions and data types.
Excellent at aggregations, but JOIN performance needs careful handling and/or clever query rewriting. Automated memory management intelligence reduces the likelihood of JOIN-related memory problems, however.
Supports advanced window functions.

Developer Experience & Operations

Local-first development with full Tinybird stack in container.
Instant feedback loop with tb dev for rapid iteration.
Frictionless schema evolution with forward queries.
Built-in testing framework and fixtures
One-command deployment to production.
Zero infrastructure management.
Built-in observability and monitoring.

Integration

Postgres integration options:
- No direct Postgres CDC integration but:
  - Streaming CDC possible through Kafka connectors or partner tools like Estuary Flow.
  - Can directly query Postgres.
Batch loading via S3/GCS.
High-frequency ingestion via Events API.
Real-time streaming that works locally and in production.
Instant REST APIs, S3/GCS Sinks, and Kafka Sinks from SQL queries.

Apache Doris

Originally developed at Baidu, Apache Doris has grown from an internal project to a serious OLAP contender. While it's seen widespread adoption in APAC, it's been gaining traction globally.

Performance & Scalability

Solid query performance at scale.
Good at star schema workloads.
Decent concurrency handling.
Resource utilization varies based on workload patterns and configuration.

SQL Compatibility

Solid ANSI SQL compatibility, particularly MySQL.
Most analytical queries work well.
Comprehensive JOIN support.
Robust support for window functions.
Query optimizer can be hit or miss but, they’ve been improving rapidly here.

Developer Experience & Operations

Simpler than Druid or Pinot but, still requires cluster expertise.
No built-in API layer.
Needs external tools for production user-facing analytics.
Docker image makes it easier to develop against it locally. Additional testing and migration tooling are also needed.

Integration

Postgres CDC support via the Flink Doris Connector (requires Flink).
Streaming ingestion support with their Streamloader and Kafka Connectors.
No built-in API generation.
Ecosystem still developing.
Strong support for batch loading from Data Warehouses and cloud storage.

StarRocks

StarRocks has quickly gained traction in the data lakehouse space. While it started as a fork of Apache Doris, it's evolved into its own beast with a nearly complete codebase rewrite.

Performance & Scalability

Strong query performance that often outperforms other solutions in benchmarks, particularly for complex JOINs and analytical workloads.
Excellent JOIN performance out of the box, a key strength.
Can handle thousands of concurrent queries.
Good resource utilization with vectorized execution.

SQL Compatibility

Strong ANSI SQL support, particularly the MySQL dialect.
Most Postgres analytics queries will likely work with minimal changes.
Comprehensive JOIN support including complex queries.
Solid window function implementation.
Cost-based optimizer that actually works.

Developer Experience & Operations

Simpler architecture than most alternatives.
Requires cluster management expertise although, managed solutions like CelerData can ease this burden.
No built-in API generation.
Need separate tools for complete analytics pipeline.
Documentation and community still maturing and growing rapidly.
Docker image makes it easier to develop against it locally. Additional testing and migration tools are also needed.

Integration

Basic Postgres CDC support via their StarRocks Migration Tool and Flink.
Strong lakehouse integration (Iceberg, Hudi, Hive).
Needs external tools for API layer although, their SQL API enables you to send queries over HTTP.
Young but growing ecosystem.

DuckDB

DuckDB brings the simplicity of SQLite to analytics workloads. It's gaining rapid adoption, especially among data scientists and developers looking for a lightweight analytical engine.

Performance & Scalability

Solid query performance for local analytics.
Excellent for medium-sized datasets.
Can struggle with very large data volumes but, they’re making strides.
Limited concurrent query capabilities.
Not really built for high-concurrency production workloads. MotherDuck’s managed solution is probably the way to go for production.

SQL Compatibility

Excellent Postgres compatibility.
Most analytical queries work unchanged.
Strong JOIN performance.
Great window function support.
Familiar for Postgres developers.

Developer Experience & Operations

Super simple to get started.
Embedded database - no cluster to manage and it makes it easier to unit test.
Great local development story with clients in several languages.
Limited production deployment and operation options with MotherDuck being pretty much the only game in town.
Missing key features production analytics APIs.

Integration

Can directly query Postgres.
Good Parquet/CSV support.
No real-time streaming support.
No built-in API generation.

DuckDB is amazing for analytics development and data exploration. MotherDuck makes it cloud-ready. But neither was really built for production analytics serving thousands of concurrent users.

Making the Decision

As you can see, each solution has its strengths and trade-offs. Some excel at raw performance, others at developer experience, and still others at operational simplicity. So how do you actually choose? Here are what I consider the most important considerations for making this decision:

Start with Your Non-Negotiables

First, identify your hard requirements:

Need sub-second query latency over billions of rows? That rules out some solutions.
Have to handle thousands of concurrent users? That eliminates others.
Working with a small team? Complex multi-component architectures might be dealbreakers.
Limited budget? Consider both initial and ongoing operational costs.

Any solution that can't meet these needs gets eliminated immediately. No point evaluating solutions that won't scale to your data volume, can't handle your concurrency needs, or you can’t afford.

Evaluate Your Data Freshness Requirements

This often gets overlooked until it's too late. Be clear about:

How stale can your analytics be?
What's your replication lag tolerance?
Do you need real-time updates?

Then look at each solution's Postgres integration options:

Can it handle CDC or WAL replication?
Does it support real-time streaming?
What's the latency from Postgres to query-ready data?
How complex is the data pipeline you'll need?

Some solutions might look great on paper, but if they can't get your Postgres data over with the freshness you need, they're non-starters. And if you need to build complex streaming pipelines just to get fresh data, that's a hidden cost you need to factor in.

Consider Your Development Velocity

For small teams, this is often the deciding factor:

How fast do you need to ship?
What's your operational capacity?
How many moving pieces can you realistically build and maintain?
How will changes and deployments work?

If you're a small team needing to ship quickly, solutions requiring multiple components and extensive infrastructure might be non-starters, regardless of their performance characteristics.

Evaluate Migration Effort

Look at your existing Postgres analytics:

How complex are your queries?
How often do they change?
What's your data ingestion pattern?

The easier it is to port your existing workflows, the sooner you'll see value from the migration.

Looking Ahead

Moving your analytics off Postgres is a journey, not a destination. In my next article, I'll walk you through how to actually build a high-performance OLTP + OLAP stack. We'll cover:

Keeping your systems in sync effectively
Managing schema changes across both databases
Monitoring and debugging a hybrid stack
Optimizing query patterns for each system
Building resilient data pipelines between them

The goal isn't just to move your analytics - it's to build a system that lets both your OLTP and OLAP workloads shine. When done right, you'll have Postgres doing what it does best (handling your transactions) while your OLAP solution delivers the real-time analytics your applications need.

Need to move your analytics off Postgres?

Tinybird is a real-time data platform optimized for low-latency, high-concurrency analytics. You can query your Postgres tables with SQL and publish queries as dynamic, scalable REST API Endpoints to power in-product reporting, real-time dashboards, and more.

You can try Tinybird for free with no time limit. Sign up here

Understanding your evaluation criteria

Performance & Scalability

SQL Compatibility

Developer Experience & Operations

Time to Value

Development Velocity

Operational Simplicity

Integration Capabilities

Comparing Real-Time OLAP Solutions

ClickHouse®

Apache Druid

Apache Pinot

Tinybird

Apache Doris

StarRocks

DuckDB

Making the Decision

Start with Your Non-Negotiables

Evaluate Your Data Freshness Requirements

Consider Your Development Velocity

Evaluate Migration Effort

Looking Ahead

Sign up for more like this.