Historex – AI-Powered Repository Archaeology with Gemma 4

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

Historex – Repository Archaeology & Engineering Intelligence

What I Built

Historex is an AI-powered repository archaeology tool that analyzes Git history to reconstruct how a codebase evolved over time.

Most repository tools focus on the current state of the codebase – static analysis, dependency scanning, or commit browsing. Historex focuses on the engineering history hidden inside Git.

Historex architecture

It identifies:

  • architectural hotspots
  • technical debt accumulation
  • subsystem evolution
  • contributor scaling patterns
  • operational pressure zones
  • risky coordination points

Instead of generating a commit summary, Historex builds an engineering narrative of the repository.

The system combines deterministic repository intelligence extraction with AI reasoning to generate:

  • interactive archaeology reports
  • engineering decision journals
  • architectural evolution timelines
  • evidence-backed repository summaries

Current features include:

  • 🐉 Dragon Map risk analysis
  • 📜 AI-generated engineering eras
  • 🧱 Technical debt signal detection
  • 🏢 Organizational and contributor analysis
  • 📊 Interactive repository evolution dashboards
  • 🌐 Local web interface for repository analysis and report browsing

The project is fully local-first and designed to work on private repositories without sending repository data to external APIs.

Demo

Demo Flow

The demo shows:

  1. Analyzing a GitHub repository from the web interface
  2. Repository intelligence extraction
  3. AI-generated archaeology report generation
  4. Dragon Map hotspot analysis
  5. Engineering Decision Journal generation
  6. Technical debt and organizational signal analysis
  7. Interactive dashboard navigation

Code

GitHub Repository:

Historex Repository

How I Used Gemma 4

Historex uses Gemma 4 as the repository interpretation layer.

I used the Gemma 4 E4B model locally because it provided the best balance between:

  • reasoning quality
  • structured output generation
  • hardware efficiency
  • local inference performance

One of the main architectural decisions was separating:

  1. deterministic repository intelligence extraction
  2. AI-powered interpretation

Python handles:

  • Git parsing
  • churn analysis
  • contributor analysis
  • hotspot scoring
  • technical debt extraction
  • repository evolution detection

Gemma 4 receives structured repository intelligence and generates:

  • engineering eras
  • archaeological summaries
  • architecture evolution interpretations
  • evidence-backed repository narratives

The model is intentionally constrained and grounded in repository evidence instead of directly analyzing raw repositories. This significantly reduced hallucinations and improved reliability.

Gemma 4 was especially effective at:

  • synthesizing long-term engineering patterns
  • identifying historical transitions
  • generating concise engineering narratives from structured repository signals

The entire system runs locally, making it suitable for analyzing private repositories securely.

Architecture

Git Repository
    ↓
Git History Ingestion
    ↓
Repository Intelligence Extraction
    ↓
Gemma 4 Interpretation Layer
    ↓
HTML / Markdown Archaeology Reports

The system currently supports:

  • local repositories
  • GitHub repository URLs
  • interactive HTML archaeology reports
  • local report storage and browsing

Technical Details

Repository Intelligence Layer

Historex extracts repository signals such as:

  • churn per file
  • subsystem evolution
  • contributor spread
  • incident-related commits
  • technical debt language
  • ownership fragmentation
  • maintenance patterns

Dragon Map

The Dragon Map identifies architectural hotspots using:

  • churn
  • contributor count
  • incident frequency
  • long-term instability

Decision Journal

The Decision Journal reconstructs engineering eras from repository evidence, helping explain:

  • scaling periods
  • stabilization phases
  • architectural rewrites
  • maintenance transitions

Local-First Design

The project is intentionally local-first:

  • repositories remain on the developer machine
  • analysis runs locally
  • Gemma 4 inference runs locally
  • generated reports are stored locally

Why I Built It

While working with large existing codebases, I realized that understanding the current code is only part of the challenge.

The harder part is understanding:

  • why architectural decisions happened
  • where instability accumulated
  • how ownership evolved
  • which parts of the system became operational bottlenecks

Git history contains that information, but it is difficult to interpret manually.

Historex was built to make that engineering history visible.

Leave a Reply