My Quest for Speed: How a Clickhouse Type Improvement Led Me Down a Caching Rabbit Hole in Rust

It all started with a wonderful quality-of-life improvement in the clickhouse crate. The recent 0.14.0 release introduced support for the RowBinaryWithNamesAndTypes format. This was a game-changer!

Gone were the days of fetching data as untyped strings and manually converting them into Rust types like Decimal. Now, I could get strongly-typed rows directly, making my code safer, cleaner, and more pleasant to write.

But with great power comes great…greed.

Seeing my data pipeline become so elegant lit a fire in me. My Axum web application was now efficiently talking to ClickHouse, but I thought, “What if we could be even faster? Why hit the clickhouse database for every single request for spot prices?” The answer seemed obvious: cache it.

The Naive Beginning: Infinite Growth

My first instinct was to reach for the trusty std::collections::HashMap.
The plan was simple: store the fetched ClickHouse data in an in-memory cache.

use std::collections::HashMap;
use std::sync::RwLock;
use indexmap::IndexMap;

struct ApplicationContext {
    // Cache mapping symbols to their historical spot prices
    spot_price_cache: RwLock<HashMap<String, IndexMap<String, Decimal>>>,
}

I used IndexMap specifically to preserve the timestamp-order of price data, which was crucial for time-series analysis. I wrapped it in an RwLock for thread-safety, and… immediately saw a problem. This cache had no memory. It would grow indefinitely, consuming more and more RAM as we tracked more symbols and time periods. A cache with no eviction policy is just a memory leak with extra steps.

The Crossroads: lru vs. moka
It was time for a proper caching solution. A quick search led me to two popular contenders in the Rust ecosystem:

lru: A classic, bare-bones LRU (Least Recently Used) cache implementation.

moka: A feature-rich caching library inspired by Caffeine from the Java world.

I was confused, but after some evaluation, my argument for moka was compelling:

Simplicity: moka’s constructor-based configuration felt intuitive. I could set up a cache with a maximum capacity in one line.

Thread-Safety: moka handles concurrency internally. No need to mess with Arc> semantics myself. This is a huge win for reducing boilerplate and potential deadlocks.

Observability: This was the clincher. moka provides built-in metrics like entry_count, hit_count, and miss_count. Being able to see how my cache was performing was invaluable for debugging and tuning.

The choice was clear. I was going with moka.

The Implementation: A Seemingly Robust Solution

I refactored my cache to use a moka::sync::Cache.


use moka::sync::Cache;
use indexmap::IndexMap;

#[derive(Clone)]
pub struct ApplicationContext {
    pub clickhouse_client: clickhouse::Client,
    pub spot_price_cache: Cache<String, IndexMap<String, Decimal>>,
}

impl ApplicationContext {
    pub fn new() -> Self {
        Self {
            clickhouse_client: create_clickhouse_client(),
            spot_price_cache: Cache::builder()
                .max_capacity(1000) // Hold up to 1000 symbol histories
                .build(),
        }
    }
}

The integration was smooth. I would check spot_price_cache for a symbol; on a miss, I’d query ClickHouse (using the lovely new typed interface), populate the cache with the IndexMap of timestamp-price pairs, and return the data. The cache metrics showed a beautiful story: a high hit rate after the warm-up period. The cache was working! The data provider was robust, and the logic was sound.

The Cold, Hard Truth: The Performance Letdown
But something was wrong. The overall response times for my spot price API endpoints hadn’t improved. In fact, they had slowed down slightly.

Confused, I broke out the profiler and did rust flamegraph. The culprit wasn’t the cache logic itself, nor the ClickHouse queries. It was the cloning.

Each time I served spot price data of a day from the cache, I was returning a clone of the entire IndexMap of intervals of that day. While IndexMap gives us ordering guarantees crucial for time-series data, it’s generally less performant for cloning than a regular HashMap due to its additional internal structures for maintaining order. When you’re dealing with hundreds of price points per day (96 entries in each IndexMap for 15-min intervals of the day), the allocation and data copying for every single request was creating significant overhead.

The cost of cloning the IndexMap was rivaling—and in some cases exceeding—the cost of a quick network round-trip to ClickHouse. The cache was “winning” logically with its high hit count, but losing the performance battle.

The Pragmatic Retreat: Feature-Flagging a Strategic Retreat

I did optimize the cache first by using Arc> to avoid the deep clone, but no IndexMaps are built differently and communicated across processes

Faced with this reality, I had to make a call. I could:

Consider smaller granularity: Cache individual price points instead of entire histories

Abandon the cache: Admit that for this specific use case, it wasn’t the right tool

For now, I chose option two. The simplicity of the direct ClickHouse call was hard to beat. To keep my new, clean code but disable the caching overhead, I did what any pragmatic rust engineer would do: I feature-flagged it.

Lessons Learned

Strong Typing is a Pure Win: The clickhouse crate’s new type support is fantastic and didn’t cause my problem. It improved my code quality.

Caching is a Trade-off: It’s not free. The overhead of storing and retrieving data from the cache must be significantly less than the cost of fetching the data fresh.

Data Structure Choice Matters: Using IndexMap over HashMap has real performance implications, especially when cloning entire structures. Preserving order has its cost!

Measure, Don’t Assume: The cache metrics told me one story (high hits = good), but the overall application profiling told the real story (cloning overhead = bad). Always profile with real-world data sizes.

moka is Excellent: My initial assessment stands. For my next caching need where the value type is cheap to clone or can be wrapped in Arc, moka will be my first choice. Its API and observability features are top-notch.

My quest for ultimate speed hit a snag, but it was a valuable learning experience. The journey to optimize is often iterative, and sometimes the bravest move is to know when to step back and try a different path.

Leave a Reply