Scalable Analytics Architecture

Building Real-Time Live Sports Viewer Analytics with Tinybird and AWS

Ariel Pérez

Dec 3, 2024 • 9 min read

Ever tried to show millions of viewers real-time stats about how many other people like them are watching the same event? It's a bit like trying to count grains of sand while they're being poured into your bucket. Fun times! Let's look at how to build this without breaking the bank (or your sanity).

The Challenge: Fan Engagement at Massive Scale

Imagine you're streaming a major live event and want to show each viewer some engaging stats:

How many people in their state are watching?
How many fans of their team are tuned in?
What's the total viewer count in their country?
What's the global audience size?

Sounds simple? Well, here's the catch - you need to handle:

3.3M concurrent viewers
350,000 events/second
17 GB/minute of incoming data
Globally distributed
And keep it all fresh within an average of 5 seconds (we're optimizing for cost here!)

Drawing inspiration from how FOX Sports built their near real-time internal analytics at massive scale, we're going to take it a step further. While their architecture excelled at delivering internal BI analytics, we want to extend it to power real-time viewer segmentation and engagement features for millions of concurrent viewers. We'll show you how to build upon their robust foundation to create engaging, personalized experiences.

The AWS Well-Architected Solution

First, let's look at how you might build this with AWS services. It follows AWS Well-Architected principles to ensure reliability and scale:

An architecture that separates concerns for reliability and scale while keeping costs in check

Component Design & Optimization

The architecture is carefully tuned to balance performance, reliability, and cost. Each choice introduces additional complexity but serves a specific performance or cost goal:

Infrastructure Choices

Kinesis Data Streams with on-demand capacity handles unpredictable traffic patterns with 1ms ingestion latency
Lightweight Lambda functions (128MB) provide 100ms processing time for JSON handling while keeping costs low
CloudFront Functions reduce latency to single-digit milliseconds for JWT validation compared to Lambda@Edge
HTTP API instead of REST API cuts request latency by 60% for simple authentication needs

Request Optimization

JWT validation at CloudFront enables request coalescing, critical for handling traffic spikes
Reduces backend load from millions to thousands of requests per second
Without this optimization, DynamoDB costs would be 3-4x higher with potential throughput issues
Each layer adds complexity but is necessary for cost management at scale

Storage Strategy

Hot path: DynamoDB delivers consistent sub-10ms reads for real-time data
Cloudfront provides low double-digit latencies via globally distributed PoPs
Cold path: S3 with Parquet format (7:1 compression ratio) optimizes for analytical queries
Geographic partitioning reduces Athena query costs by up to 90% for location-based analytics
Each storage tier requires different access patterns and maintenance strategies

Operational Efficiency

4-hour runtime window for streaming components saves 80% on Kinesis costs
30-day data retention balancing analysis needs with storage costs
10-second cache TTL cuts backend requests by 95% while maintaining reasonable freshness
Each optimization requires careful monitoring and adjustment

Data Flow

User actions hit Kinesis Data Streams (4 of them!)
Kinesis Analytics does the real-time number crunching
Then things get interesting:
- Need instant stats? That's the DynamoDB path (read more about this approach here)
- Building dashboards? Off to RDS Postgres and JSON in S3 + Athena you go
- Want historical analysis? Parquet on S3 and Athena have your back

By the Numbers

Here's what you can expect performance-wise:

Near Real-time Stats (User App Path):

Freshness: 1-11 seconds
Query Response: 26-103 milliseconds (thank you, caching!)

Streaming Analytics (BI Tool Path):

Freshness: 1 minute
Query Response: 505ms

Batch Analytics Path (BI Tool Path):

Freshness: 1 minute
Query Response: 2.5 seconds

Ad-Hoc Analytics Path:

Freshness: ~6. minutes
Query Response: 35 seconds

And it'll cost you about $1,588.30 (we'll break down those numbers in detail later).

The Tinybird Approach

Remember that complex AWS architecture we just looked at? Here's the same thing with Tinybird:

Not a typo - it's really that simple! And here's the kicker:

Performance Characteristics

Consistent performance whether you're querying last minute's or last month's data

Near Real-time Stats (User App Path):

Freshness: 2-3 seconds
Query Response: 10-25 milliseconds

Optimized Aggregate Analytics (BI Tool Path):

Freshness: 3 seconds
Query Response: 1 second

Ad-Hoc Analytics (BI Tool Path):

Freshness: 2 seconds
Query Response: 6 seconds

Operational Benefits and Cost Efficiency

Simplified Total Cost of Ownership

No separate paths needed for real-time vs historical data
Fewer moving parts means fewer things to break
Simpler architecture makes troubleshooting a breeze
When something goes wrong, you're not playing detective across multiple services

Rapid Development and Iteration

Need a new feature? Just write SQL
Want to transform data differently? SQL
Need to expose a new API endpoint? You guessed it - SQL

Streamlined Feature Development
Want to add new analytics? Let's see how it works in both approaches:

AWS Approach:

Modify Kinesis Analytics application
Update Lambda functions
Add new DynamoDB tables/indexes
Update the streaming aggregates in Postgres
Modify the batch processing pipeline
Test each component separately
Hope everything still works together

Tinybird Approach:

Then publish it as an API endpoint. Done!

How much does it cost?

Total Monthly Cost: $1,270.91

That’s 20% less than the AWS Well-Architected Solution! To be fair, all the costs are estimated based on the expected workloads and the current published prices for both AWS and Tinybird but, even within the margins of error the Tinybird implementation puts less strain on your budget. Run this for an entire month and the differences are even more dramatic.

Implementation Considerations

Go With AWS If:	Pick Tinybird When:
You're already deep in the AWS ecosystem	You want sub-second query latency without the headache
Your team dreams in Lambda functions	Your team prefers writing SQL to managing infrastructure
You need very specific control over data storage locations	You need to iterate quickly on new analytics features
You enjoy building and maintaining complex pipelines (hey, some people do!)	You don't want to manage numerous moving parts

The Bottom Line

Both approaches can handle the scale - that's not the question. The real decision comes down to what you value more: operational and architectural simplicity or operational and architectural uniformity. If you're building something new, Tinybird's approach lets you move faster and sleep easier. But if you're heavily invested in AWS services, their solution, while more complex, might fit better into your existing workflow.

Appendix

Detailed Cost estimates

The costs for both approaches are based on published on-demand pricing - both platforms offer discounts if you're willing to commit long-term.

AWS Approach

Path/Component	Cost	Notes
Total Live Event Cost	$1,592.29
Ingestion	$653.44
Kinesis Data Streams	$653.44
Kinesis Data Streams (Data In)	$326.40	17GB/min * 240 min * $0.08/GB
Kinesis Data Streams (Data Out)	$326.40	17GB/min * 240 min * $0.04/GB * 2 (1/4 * 4 Windowed KDAs + Raw Firehose)
Kinesis Data Streams (Stream Hours)	$0.64	4 streams * 4 hours * $0.04/stream-hour
Analytics	$712.57
Kinesis Data Analytics	$27.28	62 KPUs * 4 hours * $0.11/KPU-hour
Near Real-time to Mobile/Web Apps	$682.59
Lambda	$1.85
Lambda (Requests)	$1.00	87 requests/s per stream * 4 streams * 14400s * $0.2/M requests
Lambda (Duration)	$0.85	87 requests/s/stream * 4 streams * 14400s * 100ms/request * $0.0000000017/ms
Cloudfront	$677.33
CloudFront (HTTPS Requests to Origin)	$0.13	87 stat request/10s * 14400s * $0.01/10K requests
CloudFront (Functions)	$475.20	3.3M viewers * 1 request/10s/viewer * 14400s * $0.10/M invocations
CloudFront (Data Transfer)	$202.00	3.3M viewers * 1 request/10s/viewer * 14400s * 0.0000004657GB/request * $0.085/GB
API Gateway	$0.13	87 stat requests/10s * 14400s * $1.00/million HTTP API Requests
Dynamo DB	$3.29
DynamoDB (Writes)	$3.13	87 writes/s per stream * 4 streams * 14400s * 1 WRU/write * $0.625/M WRUs
DynamoDB (Reads)	$0.16	87 reads/10s * 14400s * 1RRU/read * $1.25/M RRUs
Streaming Aggregates (Postgres/BI)	$1.98
Lambda	$1.85
Lambda (Requests)	$1.00	87 requests/s per stream * 4 streams * 14400s * $0.2/M requests
Lambda (Duration)	$0.85	87 requests/s/stream * 4 streams * 14400s * 100ms/request * $0.0000000017/ms
RDS PostgreSQL	$0.13	4 hours * $0.032/hour
Batched Aggregates (Athena/BI)	$0.73
Kinesis Firehose	$0.01	87 objects/s/stream * 14400 s * 4 streams * 100 bytes/object * $0.029/GB
S3	$0.43	87 objects/s/stream * 14400 s * 4 streams * 100 bytes/object * $0.023/GB + 87 objects/s/stream * 14400 s * 4 streams * 1 POST/min * 1 min/60 s * $0.005/1000 POSTs
Athena	$0.29	1 query/min * 240 min * 0.000238 TB (avg)/query * $5.00/TB
Ad-hoc Raw Data Queries (Athena/BI)	$226.27
Firehose	$203.76
Firehose (Ingestion)	$118.32	17GB/min * 240 min * $0.029/GB
Firehose (Format Conversion)	$73.44	17GB/min * 240 min * $0.018/GB
Firehose (Dynamic Partitioning)	$12.00	17 GB/min * 240 min * $0.02/GB + 53 partitions/min * 1 object per partition * 240 min * $0.005/1000 objects + 4 hours * $0.07/hour
S3	$13.41	17 GB/min * 240 min * 1/7 parquet compression + 53 POSTs/min * 240 min * $0.005/1000 POSTs
Athena	$9.11	1 query/30 min * 240 min * 80% of records/query * 50% of columns/query * 0.0166TB/min * 1/7 parquet compression * $5.00/TB

Tinybird Approach

Cost Dimension	Cost	Notes
Total Live Event Cost	$1,270.91
Total Stored	$693.61
Ingestion	$693.60	17 GB/min * 240 min * 50% size reduction * $0.34/GB
Materialized Views	$0.01
States MV	$0.00	14 bytes/record * 50 records/s * 14400s * $0.34/GB
US/ex-US MV	$0.00	14 bytes/record * 2 records/s * 14400s * $0.34/GB
Teams MV	$0.00	22 bytes/record * 32 records/s * 14400s * $0.34/GB
Favored Winner MV	$0.00	18 bytes/record * 2 records/s * 14400s * $0.34/GB
Total Processed	$577.31
Ingestion	$285.60	17 GB/min * 240 min * $0.07/GB
Materialization	$34.18
States Materialization	$10.51	350,000 events/s * 14400s * 18 bytes/event * $0.07/GB + 350,000 events/s * 14400s * 14 bytes/event * $0.07/GB
US/ex-US Materialization	$10.51	350,000 events/s * 14400s * 18 bytes/event * $0.07/GB + 350,000 events/s * 14400s * 14 bytes/event * $0.07/GB
Teams Materialization	$13.14	350,000 events/s * 14400s * 26 bytes/event * $0.07/GB + 350,000 events/s * 14400s * 14 bytes/event * $0.07/GB
Favored Winner Materialization	$0.01	19 bytes/record * 1.42 (avg) decisions/viewer * 3.3M viewers * $0.07/GB + 15 bytes/record * 1.42 (avg) decisions/viewer * 3.3M viewers * $0.07/GB
API Endpoints	$0.49
States Endpoint	$0.00	14 bytes/read * 50 reads/s * 14400s * $0.07/GB
US/ex-US Endpoint	$0.00	14 bytes/read * 2 reads/s * 14400s * $0.07/GB
Teams Endpoint	$0.00	22 bytes/read * 32 reads/s * 14400s * $0.07/GB
Favored Teams Endpoint	$0.49	2 reads/s * 14400s * 0.000241 GB/read (avg) * $0.07/GB
Ad-Hoc BI Queries	$257.04	1 query/30 min * 240 min * 80% of records/query * 50% of columns/query * 17GB/min * 50% size reduction * 56.25% (avg data available to scan per query) * $0.07/GB

Detailed Performance Estimates

AWS Approach

Component	Activity	Latency/ Freshness (s)
Near Real-time to Mobile/Web Apps		11.208
Write Path		1.105
Kinesis Data Streams	Write/Read	0.002
Kinesis Data Analytics	Window Processing	1.000
Lambda	Process Events	0.100
DynamoDB	Write	0.003
Read Path		10.103
Cloudfront	Cache TTL	10.000
API Gateway	Auth/Route	0.100
DynamoDB	Read	0.003
Streaming Aggregates (Postgres/BI)		61.612
Write Path		1.107
Kinesis Data Streams	Write/Read	0.002
Kinesis Data Analytics	Window Processing	1.000
Lambda	Process Events	0.100
RDS Postgres	Write	0.005
Read Path		60.505
RDS Postgres	Read	0.005
BI Tools	Refresh Rate	60.000
BI Tools	Query Processing	0.500
Batched Aggregates (Athena/BI)		123.602
Write Path		61.102
Kinesis Data Streams	Write/Read	0.002
Kinesis Data Analytics	Window Processing	1.000
Kinesis Firehose	Buffer	60.000
S3	Write	0.100
Read Path		62.500
Athena	Query	2.000
BI Tools	Refresh Rate	60.000
BI Tools	Query Processing	0.500
Ad-hoc Raw Data Queries (Athena/BI)		371.502
Write Path		335.502
Kinesis Data Streams	Write/Read	0.002
Kinesis Firehose	Parquet Buffer	60.000
Kinesis Firehose	Parquet Conversion	0.500
S3	Write	275.000
Read Path		36.000
Athena	Complex Query	35.000
BI Tools	Query Processing	1.000

Tinybird Approach

Component	Activity	Latency/ Freshness (s)
Near Real-time to Mobile/Web Apps		3.100
Write Path		2.050
Events API Ingestion	Write	2.000
Materialization	Window Processing	0.050
Read Path		1.035
Endpoints	Cache TTL	1.000
Cache Read	Cache Read	0.010
Database	Read	0.025
Optimized Aggregate Queries (BI)		3.050
Write Path		2.050
Events API Ingestion	Write	2.000
Materialization	Window Processing	0.050
Read Path		1.025
Database	Read	0.025
BI Tools	Query Processing	1.000
Ad-hoc Unoptimized Queries (BI)		8.050
Write Path		2.050
Events API Ingestion	Write	2.000
Materialization	Window Processing	0.050
Read Path		6.000
Database	Read	5.000
BI Tools	Query Processing	1.000

The Challenge: Fan Engagement at Massive Scale

The AWS Well-Architected Solution

Component Design & Optimization

Data Flow

By the Numbers

The Tinybird Approach

Performance Characteristics

Operational Benefits and Cost Efficiency

How much does it cost?

Implementation Considerations

The Bottom Line

Appendix

Detailed Cost estimates

AWS Approach

Tinybird Approach

Detailed Performance Estimates

AWS Approach

Tinybird Approach

Sign up for more like this.