This is a submission for the Gemma 4 Challenge: Build with Gemma 4
Historex – Repository Archaeology & Engineering Intelligence
What I Built
Historex is an AI-powered repository archaeology tool that analyzes Git history to reconstruct how a codebase evolved over time.
Most repository tools focus on the current state of the codebase – static analysis, dependency scanning, or commit browsing. Historex focuses on the engineering history hidden inside Git.
It identifies:
- architectural hotspots
- technical debt accumulation
- subsystem evolution
- contributor scaling patterns
- operational pressure zones
- risky coordination points
Instead of generating a commit summary, Historex builds an engineering narrative of the repository.
The system combines deterministic repository intelligence extraction with AI reasoning to generate:
- interactive archaeology reports
- engineering decision journals
- architectural evolution timelines
- evidence-backed repository summaries
Current features include:
- 🐉 Dragon Map risk analysis
- 📜 AI-generated engineering eras
- 🧱 Technical debt signal detection
- 🏢 Organizational and contributor analysis
- 📊 Interactive repository evolution dashboards
- 🌐 Local web interface for repository analysis and report browsing
The project is fully local-first and designed to work on private repositories without sending repository data to external APIs.
Demo
Demo Flow
The demo shows:
- Analyzing a GitHub repository from the web interface
- Repository intelligence extraction
- AI-generated archaeology report generation
- Dragon Map hotspot analysis
- Engineering Decision Journal generation
- Technical debt and organizational signal analysis
- Interactive dashboard navigation
Code
GitHub Repository:
How I Used Gemma 4
Historex uses Gemma 4 as the repository interpretation layer.
I used the Gemma 4 E4B model locally because it provided the best balance between:
- reasoning quality
- structured output generation
- hardware efficiency
- local inference performance
One of the main architectural decisions was separating:
- deterministic repository intelligence extraction
- AI-powered interpretation
Python handles:
- Git parsing
- churn analysis
- contributor analysis
- hotspot scoring
- technical debt extraction
- repository evolution detection
Gemma 4 receives structured repository intelligence and generates:
- engineering eras
- archaeological summaries
- architecture evolution interpretations
- evidence-backed repository narratives
The model is intentionally constrained and grounded in repository evidence instead of directly analyzing raw repositories. This significantly reduced hallucinations and improved reliability.
Gemma 4 was especially effective at:
- synthesizing long-term engineering patterns
- identifying historical transitions
- generating concise engineering narratives from structured repository signals
The entire system runs locally, making it suitable for analyzing private repositories securely.
Architecture
Git Repository
↓
Git History Ingestion
↓
Repository Intelligence Extraction
↓
Gemma 4 Interpretation Layer
↓
HTML / Markdown Archaeology Reports
The system currently supports:
- local repositories
- GitHub repository URLs
- interactive HTML archaeology reports
- local report storage and browsing
Technical Details
Repository Intelligence Layer
Historex extracts repository signals such as:
- churn per file
- subsystem evolution
- contributor spread
- incident-related commits
- technical debt language
- ownership fragmentation
- maintenance patterns
Dragon Map
The Dragon Map identifies architectural hotspots using:
- churn
- contributor count
- incident frequency
- long-term instability
Decision Journal
The Decision Journal reconstructs engineering eras from repository evidence, helping explain:
- scaling periods
- stabilization phases
- architectural rewrites
- maintenance transitions
Local-First Design
The project is intentionally local-first:
- repositories remain on the developer machine
- analysis runs locally
- Gemma 4 inference runs locally
- generated reports are stored locally
Why I Built It
While working with large existing codebases, I realized that understanding the current code is only part of the challenge.
The harder part is understanding:
- why architectural decisions happened
- where instability accumulated
- how ownership evolved
- which parts of the system became operational bottlenecks
Git history contains that information, but it is difficult to interpret manually.
Historex was built to make that engineering history visible.

