The Data Base

Query Booster: How Tinybird optimizes table schemas for you

Query Booster is an intelligent feature that automatically monitors your database usage patterns and optimizes performance by fine-tuning data source schemas.

Ramiro Aznar, Jordi Villar

May 12, 2025 • 6 min read

A customer recently reached out to us with an interesting observation: their query latency had mysteriously dropped without any manual intervention. What they discovered was our Query Booster, quietly optimizing their workload behind the scenes. Query Booster automatically adapts to your database usage patterns, fine-tuning data source schemas for optimal performance.

An example of Query Booster in action, reducing query latency from ~175 ms to <5 ms without manual intervention.

Why we built the Query Booster

Slow database queries hurt your business in several ways. When queries are slow, you need more infrastructure and resources to run them, which costs more money. Your users have a worse experience because they have to wait longer for data. Engineers waste time fixing slow queries instead of building new features. And you might miss important insights because the data takes too long to arrive.

To understand how Query Booster solves these problems, we need to understand how Tinybird works. In columnar databases, the sorting key is the most critical setting for performance. This key determines how data is physically stored on disk, how efficiently the database can filter and process queries, and serves as the default primary key and index structure.

If you are filtering by a column that is not in the sorting key, the database must perform a full table scan. This means reading significantly more data than necessary, since the column values are scattered throughout the storage. As your data volume grows, this mismatch between sorting key and query patterns leads to increasingly degraded performance due to full table scans.

In an ideal scenario, you'd know all your query patterns in advance and design the perfect sorting key. But reality is different. New use cases come up over time, query patterns evolve as your business grows, different teams access data in different ways, and performance requirements change constantly.

When this happens, you face a common issue. Your queries start performing full table scans, causing performance to degrade significantly. Engineers spend valuable time manually optimizing schemas, which requires careful planning and potential downtime. And just when you think you've solved the problem, new patterns emerge, and the cycle begins again.

This is where Query Booster comes in. Instead of manually tuning your database, Query Booster automatically monitors your query patterns, identifies optimization opportunities, and creates temporary optimized schemas. It continuously validates performance improvements and removes optimizations when they're no longer needed.

The result? Your database continuously adapts to your usage patterns, maintaining optimal performance without any manual intervention.

How Query Booster works

Query Booster is like having a database performance expert working 24/7 to optimize your data source schemas. Here's how it works:

Continuous Monitoring: The system constantly watches your query patterns
Query Plan Analysis: It identifies frequently run queries and analyzes their optimization potential
Automatic Optimization: When beneficial, it creates temporary optimized data source schemas
Performance Validation: The system monitors the optimization's effectiveness and removes it if it's not delivering results

Let's dive into some technical aspects that make Query Booster interesting:

Continuous Monitoring

Every query executed in Tinybird is automatically monitored and analyzed. We collect metrics that paint a complete picture of your query performance. This includes query execution time, the amount of data processed, and resource utilization. We also track query patterns, frequency, and the specific filter conditions being used. These metrics are stored in our monitoring system and made available to users through Service Data Sources, allowing you to track query performance over time.

Our solution reviews these metrics periodically to detect the queries processing the biggest amount of data and the ones being executed more frequently, as these represent critical parts of your workload.

For queries considered optimizable, we analyze the query plan, which is a detailed representation of what actions the database will perform. It contains information about the data sources being accessed, the filters that are applied, transformations, etc. In our case, we extract and analyze this query plan by parsing the EXPLAIN output. We use this information to look at whether the database is using the current sorting key effectively, whether the query is reading more data than necessary, and whether the current schema aligns with the query's access patterns.

Query Plan Analysis

Once we have the list of queries considered as optimizable, we analyze the query plan using the EXPLAIN output. To do so, we check the following things:

Check if the query is using the primary key (usually the sorting key). If the primary key is not being used, we can directly consider the query as optimizable.
Check if the query is reading enough rows. If the query reads very few rows, it's probably already efficient. No need to optimize further.
Check if the filtering is inefficient (low selectivity). Specifically, we check if the query reads more than 25% of the table's available rows. If it does, that means the current sorting key is not helping much — it's not filtering efficiently. This is a good signal that a better sorting order could help.

Let's look at an example to better understand how we use the EXPLAIN output. For a table with sorting keys as id_1, id_2, and a query like this:

The EXPLAIN output for this query would look like this:

You can see that the query is using the primary key, and it's reading 5/1224 (~0.4%) of the table. This means that the query is filtering efficiently. If we now change the query to:

The EXPLAIN output looks like this:

You can see that the query is using the primary key, but it's reading 1224/1224 (100%) of the table. This is an extreme case, but it shows that the query is not filtering efficiently. We could change the sorting key to id_2, id_1, and the query would be faster, but that would make the initial query slower.

Automatic Optimization

If the query is still considered optimizable, we suggest a better data source schema. Now that we've confirmed optimization might help:

The query booster chooses columns using a combination of their cardinality and the type of filter they are being used for. For example, columns (and their cardinality) are treated differently depending on if they are used for point filtering or range filtering.
Sort the filter columns by cardinality (ascending). Columns with lower cardinality (fewer distinct values) go first — they help partition the data into coarse groups. Higher cardinality columns go later — they help refine filtering within those groups.
Suggest the first four columns from this sorted list as a new optimized data source schema. These columns become the new sorting key for the data source.

If the resultant suggested sorting key is a prefix of the already existing sorting key, we will not suggest an optimization. For example, if the sorting key is [a, b, c, d] and your query filters on [a, b], this is not considered optimizable. Why? Because it's already aligned with the sorting key, and we are already using the primary index efficiently. A new optimization wouldn't help.

Otherwise, we will proceed to create a new data source schema with the suggested sorting key.

The decision to just use data source sorting keys to improve the performance of a query comes from years of experience debugging customer query performance issues. As we explained in our optimization guides, most of the time a query is not performing as expected is due to a bad set of sorting keys.

Performance Validation

But that is not all, we log and monitor optimizations to keep track of the impact they have on the performance of the queries. We only keep optimizations that are still useful:

Optimizations that are not being used anymore due to clients stopping endpoint calls will be removed but we'll allow them to create them again if endpoint calls are back. If the query we tried to optimize is not executed in 24h, we discard the optimization.
Optimizations that are not being used despite endpoints being hit. This could be caused by sorting keys being wrong or by any other unexpected issues. Those optimizations will be "banned" and won't be created again. There are two ways to determine if an optimization is not being used:
- Optimization not being used at all: If the query is being called, but the optimization is not being used.
- Optimization is being used but not filtering efficiently: If the optimization is being used, but the query is reading more data than when using the original sorting key.

Another example of Query Booster in action.

Coming back to the surprised customer, the results were good:

2x reduction in query times for their most common workloads
Decreased CPU usage for the same queries
Zero configuration or maintenance required

And while this particular case showed a 2x improvement, we've seen cases where the optimization leads to 100x faster queries.

Intelligence matters

The beauty of Query Booster lies in its simplicity from a user's perspective. You don't need to be a database expert or spend time manually optimizing your queries. The system handles that for you, continuously learning and adapting to your specific usage patterns.

Remember: The best optimizations are the ones you don't have to think about. Query Booster embodies this principle by automatically improving your database performance while you focus on what matters most - your application and business logic.