System Design Interview: Autocomplete / Type-ahead System (Final Part)

System Design Interview: Autocomplete / Type-ahead System (Final Part)

In Part 1, we covered requirements and capacity planning.
In Part 2, we explored tries, top-K optimization, and system architecture.

In this final part, we’ll focus on scaling the trie, efficient updates, deployment strategies, and partitioning trade-offs — the exact topics that distinguish a good system design answer from an excellent one.

📌 Table of Contents (Final Part)

  1. Splitting the Trie Across Servers
  2. Updating the Trie Efficiently
  3. Deployment Strategies
  4. Persisting the Trie
  5. Data Partitioning Strategies
  6. Trade Off Summary
  7. Low Level Design
  8. Final Thoughts

Splitting the Trie Across Servers

Once range-based partitioning is selected, the trie must be physically split across machines in a way that keeps:

  • Routing simple
  • Lookups fast
  • Latency predictable

Alphabet-Based Sharding

The most straightforward strategy is alphabet-based sharding.

Example

Shard 1 → a–k
Shard 2 → l–z

Each shard:

  • Stores its own trie fragment
  • Has multiple replicas
  • Independently serves autocomplete requests

Why This Works

  • Deterministic routing
  • Single-shard reads
  • No result aggregation
  • Stable latency

This is ideal for prefix-based systems like autocomplete.

Finer-Grained Sharding

As traffic and data grow, shards can be incrementally split.

Examples

Shard 1 → a
Shard 2 → b
Shard 3 → c–d

Or even:

Shard A1 → ap
Shard A2 → am
Shard A3 → an

This approach allows scaling without redesigning the system.

Why Prefix-Based Sharding Works So Well

  • Each request hits exactly one shard
  • No fan-out queries
  • No cross-shard aggregation
  • Latency remains low at global scale

Updating the Trie Efficiently

Updating the trie synchronously for every search is impractical.

Autocomplete systems must decouple reads from writes.

Logging and Aggregation Pipeline

  1. User searches are logged asynchronously
  2. Logs are written to a durable store (Kafka / Cassandra)
  3. Periodic batch jobs aggregate frequencies

This avoids blocking the serving path.

Pruning During Aggregation

To control growth:

  • Very old queries are discarded
  • Queries below a frequency threshold are removed
  • Only meaningful prefixes survive

This keeps the trie compact and relevant.

Applying Updates Safely

  • A new trie is built offline
  • Top-K lists are recomputed
  • Once validated, the new trie atomically replaces the old one

👉 Read traffic is never blocked during updates.

Deployment Strategies

Trie updates must be deployed without downtime.

Blue-Green Deployment

  • New trie version built in parallel
  • Validated with shadow traffic
  • Traffic switched atomically
  • Old version discarded

This is the safest approach.

Master-Slave Strategy

  • Master serves live traffic
  • Slave is rebuilt and updated
  • Roles are swapped

Both strategies ensure zero downtime and fast rollback.

Persisting the Trie

Rebuilding a trie from raw logs can take hours.

To enable fast recovery:

  • Trie snapshots are periodically persisted
  • Nodes are serialized level-by-level

Example Serialization

C2,A2,R1,T,P,O1,D

On restart:

  • Trie is reconstructed from snapshots
  • Top-K lists are recomputed
  • Service resumes quickly

Data Partitioning Strategies

When the dataset grows beyond a single machine, partitioning strategy becomes critical.

Range-Based Partitioning

Data is split using natural prefix ranges.

Example

Shard 1 → a–c
Shard 2 → d–f
Shard 3 → g–z

Why It Fits Autocomplete

  • Prefix routing is trivial
  • Only one shard per request
  • Trie traversal stays localized

Limitation

  • Popular prefixes (a, s, th) can create hot shards
  • Requires finer splits over time

Capacity-Based Partitioning

Servers are filled until capacity limits are reached.

Example

Server 1 → a → apple → apply (full)
Server 2 → aws → banana → ball

Advantages

  • Efficient hardware utilization

Challenges

  • Dynamic prefix-to-server mapping
  • Routing depends on metadata
  • More operational complexity

Often requires ZooKeeper-like coordination.

Hash-Based Partitioning

Data is distributed using a hash function.

Example

hash("apple") → Server 2
hash("apply") → Server 7

Why It Fails for Autocomplete

  • Prefixes are scattered
  • One request hits multiple shards
  • Requires result aggregation
  • Increases latency

🚫 Hash-based partitioning is usually avoided for prefix search systems.

Partitioning Strategy Summary

👉 Range-based partitioning aligns naturally with prefix queries and is the foundation for scalable autocomplete systems.

Distributed Trie Storage (Expanded)

At scale, the trie is:

  • Logically one structure
  • Physically distributed

Each machine owns a subset of prefixes but behaves as part of a unified system.

Trie Partitioning Example

Shard 1 → a–c
Shard 2 → d–f
Shard 3 → g–z

Hot prefixes are split further:

Shard A1 → a
Shard A2 → b
Shard A3 → c

This prevents hotspots and balances load.

Replication Strategy

Each shard is replicated:

Shard 1 → Replica 1, Replica 2, Replica 3

Benefits

  • High Availability
  • Fault Tolerance
  • Read Scalability

Storage Technologies

Cassandra (Durable Layer)

Best when:

  • Trie partitions are large
  • QPS is extremely high
  • Cross-region replication is required

Usage:

  • Trie stored as serialized blocks
  • Loaded into memory on startup
  • Cassandra acts as source of truth

ZooKeeper (Coordination)

ZooKeeper manages:

  • Prefix-to-shard mappings
  • Replica health
  • Leader election
  • Atomic trie version switching

It guarantees consistent routing decisions.

Trade Off Summary

Aspect Decision Reasoning
Data Structure Trie Efficient prefix matching
Ranking Pre-computed Top-K Predictable low latency
Cache Redis Absorb hot prefixes
Updates Offline batching Protect read path
Partitioning Range-based Single-shard prefix routing
Storage Distributed replicas Scalability & availability

Low Level Design

LLD Goal

Explain how data is structured, how lookups work internally, and why latency is predictable.

🔹 LLD – Trie Node Structure

TrieNode {
  children: Map<Character, TrieNode>
  isTerminal: boolean
  frequencyScore: number
  topKSuggestions: MinHeap / Sorted List (size K)
}

What Each Field Means

  • children → next characters
  • isTerminal → end of a valid query
  • frequencyScore → popularity (freq + recency)
  • topKSuggestions → pre-computed best results for this prefix

🔹 LLD – Trie Example

Queries:

"app" (50)
"apple" (100)
"app store" (200)

Trie view:

a
└── p
    └── p
        ├── app*        [top-K: app store, apple, app]
        ├── apple*      [freq=100]
        └── app store*  [freq=200]

Each prefix node already knows its best suggestions.

🔹 LLD – Prefix Lookup Flow

Input Prefix: "app"

1. Traverse trie: a → p → p
2. Reach prefix node
3. Read pre-computed top-K
4. Return immediately

Time Complexity: O(length of prefix)

🔹 LLD – Cache Key Design

Key: autocomplete:{locale}:{prefix}

Example:
autocomplete:en-IN:how

Value:

[
  "how to cook rice",
  "how to lose weight",
  "how to make tea"
]

🔹 LLD – Cache + Trie Interaction

App Server
   |
   |---> Redis (Hit?) ----> Return result
   |
   |---> Trie Storage ----> Fetch top-K
                               |
                               +--> Store in Redis (TTL)

🔹 LLD – Distributed Trie Partitioning

Shard 1 → prefixes a–c
Shard 2 → prefixes d–f
Shard 3 → prefixes g–z

For hot prefixes:

Shard A1 → ap
Shard A2 → am
Shard A3 → an

✔ Single-shard lookup
✔ No aggregation
✔ Predictable latency

🔹 LLD – Write / Update Path (Offline)

User Searches
     |
     v
+-------------+
| Log Stream  | (Kafka)
+-------------+
     |
     v
+-------------+
| Aggregation | (Batch Job)
+-------------+
     |
     v
+-------------+
| Build New   |
| Trie        |
+-------------+
     |
     v
+-------------+
| Atomic Swap |
+-------------+

Writes never block reads.

🔹 LLD – Failure Handling

Failure Behavior
Redis down Read directly from trie
Trie shard down Route to replica
App server down LB removes it
Region down Route to nearest region

🎯 LLD Interview Explanation (1–2 lines)

“Each trie node stores pre-computed top-K suggestions, so autocomplete never traverses subtrees at runtime.
This guarantees predictable latency even at very high QPS.”

3️⃣ HLD vs LLD – Interview Cheat Sheet

Aspect HLD LLD
Focus Components & flow Data structures & internals
Audience Product / System Engineers
Latency Cache → Trie O(prefix length)
Updates Offline Atomic swap
Scaling Shards + replicas Prefix routing

Final Thoughts

Autocomplete looks simple — but at scale, it’s a carefully engineered distributed system.

The winning design principles are:

  • Tries for structure
  • Pre-computation for speed
  • Caching for hot paths
  • Batch updates for safety
  • Replication for resilience

If you can explain this flow clearly in an interview, you demonstrate not just knowledge — but systems thinking.

Leave a Reply