Claude Code Forgets Everything Between Sessions. I Tested 5 Fixes.

Claude Code starts every session from zero — no memory of past decisions, debugging breakthroughs, or project context. I tested five approaches to this problem: manual CLAUDE.md files, the built-in auto-memory system, claude-mem’s semantic vector search, topic-based shell hooks with Claude-Recap, and work tracking with Beads. Most developers need only the two built-in solutions layered together. For those who want structured conversation archives searchable by topic, per-topic tracking fills a gap nothing else covers.

A developer on GitHub described it perfectly: “It’s a goldfish.”

You close a Claude Code session after an hour of debugging auth. Open a new one. Claude has no idea what you were working on — the three approaches you tried, the edge case you found, the architecture decision you landed on. Gone.

Context compaction makes it worse. Mid-session, Claude silently truncates older messages to fit the context window. One developer described losing four hours of work: “Architectural reasoning evaporated. It never fully came back.” Another lost six active sessions to a two-minute power outage — all context, not just the current topic, but weeks of accumulated understanding.

The community has built at least a dozen solutions. I tested five.

Disclosure: I built one of the tools discussed here (Claude-Recap, #4). I’ll be upfront about its limitations.

1. Manual CLAUDE.md — You Are the Memory

Claude Code loads CLAUDE.md from your project root at every session start. You write it, you maintain it, you control exactly what Claude remembers.

# CLAUDE.md
## Architecture
- Express + PostgreSQL, JWT auth
- API routes in src/api/, middleware in src/middleware/

## Decisions
- Chose bcrypt over argon2 (Node.js native support)
- Rate limiting via express-rate-limit, 100 req/min per IP

This works better than you’d expect. One power user restructured their CLAUDE.md from 700 lines of everything to 83 lines of essentials plus 18 on-demand topic files. Result: persistent token count dropped from ~9,500 to ~1,760 — an 81% reduction.

The limitation is obvious: if you don’t write it down, it doesn’t exist. Nobody updates their CLAUDE.md at 11 PM after a debugging marathon. And the recommended cap is 200 lines — not much room for a project with years of context.

Best for: Stable project conventions, coding standards, architecture context that rarely changes.

2. Built-in Auto-Memory (MEMORY.md) — Claude Learns on Its Own

Since v2.1.59 (February 2026), Claude Code saves notes for itself — build commands, debugging patterns, code style preferences — to ~/.claude/projects/<project>/memory/MEMORY.md. Zero setup.

The catch: a hard 200-line cap silently truncates older content. Claude decides what’s worth remembering, and there’s no topic organization — it’s a flat notebook. It also can’t recover from compaction; by the time context gets compressed, whatever was in those older messages is already lost.

A subtle failure mode: Claude sometimes creates files like HANDOVER.md thinking they’ll be loaded, but only CLAUDE.md and MEMORY.md (first 200 lines) actually auto-load. The agent has false confidence about its own memory.

Best for: Organic “getting smarter over time” within a project. Let it run, don’t rely on it alone.

3. claude-mem — Semantic Search Over Your Past

The most-starred Claude Code memory plugin (~32K stars). SQLite + ChromaDB for storage, vector embeddings for semantic retrieval. Ask “how did we handle authentication?” and relevant memories surface across sessions, even if you don’t remember exact terms.

The trade-offs are significant. ChromaDB has caused 250–380% CPU usage on Apple Silicon. One user switched to a cheaper LLM for observation extraction and ended up with 411 hallucinated memories — the model “confidently fabricated” files and code changes that never existed. Another discovered their monorepo memories were silently filed under the wrong project because the tool used basename(cwd) instead of the full path.

Dependencies: Node.js, Python (uv), ChromaDB, SQLite, plus a background Worker process that must stay running.

Best for: Developers who need semantic search across many sessions and can absorb the infrastructure.

4. Claude-Recap — Per-Topic Archives with Shell Hooks

I built this, so take my assessment accordingly.

Most memory tools treat a session as one unit. But a single session often covers five topics — auth setup, CSS debugging, API refactoring, test fixes, deployment config. When it all becomes one memory blob, finding “what did we decide about auth?” means re-reading everything.

Claude-Recap tracks topics within sessions. Every Claude response starts with a topic tag (› fix-login-bug). When the topic changes, the Stop hook detects it, and the old topic gets summarized and archived:

~/.memory/projects/-Users-you-my-app/
  {session-id}/
    01-setup-auth.md        # "Chose JWT, set up middleware..."
    02-fix-login-bug.md     # "Root cause: stale token cache..."

It’s two shell hooks, bash scripts, and Markdown files. No database, no vector embeddings, no background services beyond what Claude Code already runs.

The main weakness: it depends on the LLM outputting a topic tag at the start of each response. This works reliably but isn’t guaranteed — especially right after compaction. There’s no semantic search; you’re back to grep and file names. And the compaction recovery path (spawning a headless Claude process to cold-read the transcript) is slower than the in-session path.

Best for: Developers who work on multiple topics per session and want grep-able, per-topic archives.

Repo: github.com/hatawong/claude-recap

5. Beads — Track the Work, Not the Words

A different framing (~18K stars). Instead of remembering conversations, Beads tracks what was done — code changes, task status, dependencies, next steps. It uses a Git-backed graph issue tracker where closed tasks get auto-summarized and bd ready shows what’s unblocked.

It doesn’t help with “what did we decide about auth?” But it answers “what’s the next step?” and “what’s blocking this?” — questions that matter more as projects grow beyond what one person can hold in their head.

Best for: Complex projects where tracking deliverables matters more than recalling conversations.

Comparison

CLAUDE.md Auto-Memory claude-mem Claude-Recap Beads
Who writes You Claude Automated Automated Automated
Granularity Project Notes Session Topic Task
Storage Markdown Markdown SQLite+ChromaDB Markdown Git-backed DB
Dependencies None None Node, Python, ChromaDB bash, Node.js Go
Compaction-safe Yes Partial No Yes (cold-read) N/A
Search grep grep Semantic grep + filenames Structured

Three Things the Community Figured Out

Testing these tools and reading through issues across three ecosystems (claude-mem, OpenClaw, Cline) surfaced patterns that aren’t obvious from any single tool’s README.

Push beats pull. LLMs don’t self-invoke memory retrieval tools. Three communities discovered this independently. One user’s frustration: “I have to go ‘read your memory’ then stop it from doing whatever it THOUGHT that meant and go ‘USE THE MCP.'” Memory must be injected at session start, not left as an on-demand tool. Every reliable solution in this list pushes context into the session.

Files beat databases — for individuals. A developer running a 10-agent production team chose WORKING.md over vector databases, citing debuggability. Letta’s benchmark found filesystem-only memory at 74.0% accuracy, beating Mem0’s graph variant at 68.5%. For single developers, cat to verify beats a query language to debug.

The context author writes the best summary. This is the most underappreciated insight. The agent currently discussing your code has full context. Having it write the summary directly produces far better results than extracting from a transcript afterward. This is why claude-mem’s post-hoc compression created 411 fake memories while in-context summaries stay accurate — the summarizer was there when it happened. I call this “eyewitness vs. cold reader,” and it shapes every design decision in Claude-Recap: the Stop hook captures the summary while the agent still has context, and cold-reading from transcripts is only a fallback for compacted sessions.

What I Actually Use

Three layers, no conflict:

  1. CLAUDE.md — static project rules: “use bun, never auto-commit, project uses JWT auth”
  2. Auto-Memory — on by default; Claude learns build commands and patterns over time
  3. Claude-Recap — every topic gets archived with a summary I can find with grep

These aren’t competing approaches. Static rules, learned patterns, and conversation history are three distinct categories of knowledge. Most developers will be well-served by the first two alone.

FAQ

Does Claude Code have built-in memory?
Yes. Since v2.1.59 (February 2026), it writes notes to MEMORY.md automatically. The first 200 lines load at each session start. For many developers, this plus a handwritten CLAUDE.md is enough.

Do these tools work together?
Yes. CLAUDE.md, Auto-Memory, and Claude-Recap operate on different layers without conflict. Beads solves a different problem entirely (work tracking) and complements all three.

What about privacy?
All five approaches are 100% local. Nothing leaves your machine. Memory files are plain Markdown you can read, edit, or delete at any time.

Which should I start with?
CLAUDE.md. Five minutes documenting your project context saves hours of re-explanation. Auto-Memory is already running if you’re on v2.1.59+. Add other tools only when you feel the gap.

Does semantic search matter?
Depends on volume. For most individual developers, grep over well-named files is fast and predictable. If you have hundreds of past sessions and need fuzzy matching, claude-mem’s vector search helps — at the cost of significant infrastructure.

I built Claude-Recap to solve this for my own workflow. Issues and contributions welcome.

Leave a Reply