Sub-50ms Analytics: How We Scaled Reporting for Billions of Records
Once your app hits a certain scale—hundreds of millions or even billions of records—handling analytics and reporting becomes a completely different beast.
We felt this acutely last year at SparkLoop. Our platform had grown to store hundreds of millions of events just as we introduced an "Advanced Reports" page. This feature allowed users to generate reports across arbitrary time ranges, filtered by six different dimensions.
Our initial approach was to use ClickHouse. It made sense, right? After all, ClickHouse is built specifically for this use case.
Wrong. It was a disaster.
The problem wasn't ClickHouse itself—it’s a fantastic tool—but rather the assumption it's built on: that data is immutable. That immutability is exactly what makes it so fast.
But at SparkLoop, our data is rarely static. Take referrals: they start as "pending," then move to "confirmed" or "rejected." To handle these updates in ClickHouse, you have to constantly deduplicate data (usually via complex ReplacingMergeTree logic). That process was cumbersome, fragile if column schemas changed, and incredibly resource-intensive.
We could only afford to run the deduplication every three hours, which meant the dashboard was always slightly out of sync. To top it off, it was expensive: our instance was costing us $1,000/month just to serve these reports.
The turning point came when I saw an interview between Jesse Hanley with Aaron Francis. Jesse mentioned using ElasticSearch to keep his site snappy. It stopped me in my tracks. I’d always thought of ElasticSearch as a tool strictly for search—and one that was a nightmare to maintain.
One day, out of desperation, I DMed Jesse for advice. He was incredibly helpful and opened up the world of Elasticsearch for me. I learned two things that changed my perspective:
- It’s an analytics powerhouse: While designed for search, ES handles "Aggregations" (their version of GROUP BY) incredibly well. Unlike ClickHouse, ES allows you to update a document by ID easily, making it perfect for our mutable data.
- The complexity is optional: Most of the maintenance headache is abstracted away if you use the searchkick gem (shoutout to the prolific Andrew Kane) and a dedicated host like Elastic.co
Armed with Searchkick and Jesse’s encouragement, I started hacking. It took a few days to wrap my head around the new architecture, but once I created my indexes and a thin wrapper to query them in Rails, the difference was night and day:
- Ridiculously fast: Most queries—even those calculating totals across hundreds of millions of records—return in under 50ms.
- No more deduplication: ES handles updates to existing records gracefully. No more background jobs, no more lag.
- Real-time data: The dashboard is always current.
- A healthier Postgres: Because we moved these heavy analytical queries over to ES, the load on our primary PostgreSQL database decreased by 25%, making the entire app feel more responsive.
The best part? We got all this extra performance for 1/10th the monthly spend as our struggling ClickHouse instance.
I’m incredibly grateful to Jesse for the pointer and the folks at Elastic for the tech. It’s a reminder that when you’re stuck in a technical hole, sometimes the best move isn’t to dig harder—it’s to reach out to the community and ask how they solved it.