Project: Unstructured to structured
What this AI agent actually does?
This self-improving AI agent takes messy documents (invoices, contracts, medical reports, whatever) and turns them into clean, structured data and CSV tables. But here’s the kicker – it actually gets better at its job over time.
Example of the power of this AI agent:
Input:
What you get as output:
Structured data (csvs files), with ultra accuracy:
Also you get
JSON file for each document, structured data as a JSON format, following a general schema para todos, llm decide cual es el mejor esquema, dependiendo el tipo de documento que hayas ingresado, so you can manipulate your data in different ways
{
"header": {
"document_title": {
"value": "Purchase Order",
"normalized_value": "Purchase Order",
"reason": "Top-right prominent heading reads 'Purchase Order' (visual title header).",
"confidence": 0.95
},
"purchase_order_number": {
"value": "PO000495",
"normalized_value": "PO000495",
"reason": "Label 'PO No: PO000495' printed near the header on the right; matched to schema synonyms 'PO No' / 'Purchase Order #'.",
"confidence": 0.95
},
"po_date": {
"value": "04/26/2017",
"normalized_value": "2017-04-26",
"reason": "Date '04/26/2017' directly under PO number in header; normalized to ISO-8601.",
"confidence": 0.95
},
"parties": {
"bill_to_name": {
"value": "PLANERGY Boston Office",
"normalized_value": "PLANERGY Boston Office",
"reason": "Top-left block under the company logo lists 'PLANERGY' and 'Boston Office' — interpreted as the billing/requesting organization.",
"confidence": 0.88
},
...
"items": {
"items": {
"value": [
{
"item_name": "Nescafe Gold Blend Coffee 7oz",
"item_description": null,
"item_code": "QD2-00350",
"sku": "QD2-00350",
"quantity": 1.0,
"unit": null,
"unit_price": 34.99,
"discount": 0.0,
"line_total": 34.99,
"currency": "USD"
},
{
"item_name": "Tettley Tea Round Tea Bags 440/Pk",
"item_description": null,
"item_code": "QD2-TET440",
"sku": "QD2-TET440",
"quantity": 1.0,
"unit": null,
"unit_price": 20.49,
"discount": 0.0,
"line_total": 20.49,
"currency": "USD"
},
...
What my AI agent actually does (and why it’s pretty cool)
Unstructured to Structured: This self-improving AI agent takes messy documents (invoices, contracts, medical reports, whatever) and turns them into clean, structured data and CSV tables. But here’s the kicker – it actually gets better at its job over time.
So I built this thing called “Unstructured to Structured”, and honestly, it’s doing some pretty wild stuff. Let me break down what’s actually happening under the hood.
The problem I was trying to solve
You know how most AI agents are pretty static? They follow some instructions, give you an output, and if an error occurs nothing happens until the engineers modify something manually. My agent has super powers – it actually improves itself! Literally, it has autonomous fixes.
Like, imagine you have a bunch of invoices, you upload them and the AI processes them into structured data. Most AI tools would just give you the data and if there’s an error about field extraction or schema inference, nothing happens. Mine actually analyzes the documents, infers the schema, extracts the data, and if the LLM drifts, for example the field mapping was wrong 😱, it actually detects the issue and fixes itself so it never happens again.
Full code open source at: https://github.com/your-username/handit-examples
Let’s dive in!
Table of Contents
- What my AI agent actually does (and why it’s pretty cool)
- The problem I was trying to solve
- 1. Architecture Overview
-
2. Setting Up Your Environment
- Backend
- Frontend
- 3. The Core: LangGraph Workflow 🧠
-
4. Node Classes: Specialized Tools for Every Task 🎯
- Inference Schema Node – the schema detective
- Invoice Data Capture Node – the data extractor
- Generate CSV Node – the table builder
-
5. The self-improvement (Best Part)
- 1. Let’s setup Handit.ai observability
- 2. Set up evaluations
- 3. Set up self-improvement (very interesting part)
- 6. Results
- 7. Conclusions
1. Architecture Overview
Let’s understand the architecture of our Unstructured to Structured AI agent:
[Document Upload] → [Schema Inference] → [Data Extraction] → [CSV Generation]
↓ ↓ ↓ ↓
FastAPI Server LangGraph Node LangGraph Node LangGraph Node
(Rate Limited) (AI Schema) (AI Extraction) (AI Tables)
↓ ↓ ↓ ↓
Handit.ai Handit.ai Handit.ai Handit.ai
Tracing Tracing Tracing Tracing
This architecture separates concerns into distinct nodes:
- FastAPI Server: Handles document uploads with rate limiting and CORS protection
- Schema Inference Node: Uses AI to analyze all documents and create a unified JSON schema
- Data Extraction Node: Maps document content to the inferred schema using AI
- CSV Generation Node: Creates structured tables and CSV files from the extracted data
- Handit.ai Integration: Every step is traced, evaluated, and can be automatically improved
2. Setting Up Your Environment
Backend
1. Clone the Repository
git clone https://github.com/your-username/handit-examples.git
cd handit-examples/examples/unstructured-to-structured
2. Create Virtual Environment
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venvScriptsactivate
3. Install Dependencies
# Install dependencies
pip install -r requirements.txt
4. Environment Configuration
# Copy environment example
cp .env.example .env
5. Configure API Keys
Edit the .env
file and add your API keys:
# Required API Keys
# Get your API key from: https://platform.openai.com/api-keys
OPENAI_API_KEY=your_openai_api_key_here
# Get your API key from: https://www.handit.ai/
HANDIT_API_KEY=your_handit_api_key_here
# Optional Configuration
OPENAI_MODEL=gpt-4o-mini
RATE_LIMIT_REQUESTS=100
RATE_LIMIT_WINDOW=3600
6. Run the Application 🚀
Development Mode
# Make sure virtual environment is activated
source .venv/bin/activate # macOS/Linux
# or
.venvScriptsactivate # Windows
# Start the FastAPI server
python main.py
The server will start on http://localhost:8000
Frontend (Optional)
You can test the API directly using the FastAPI docs at http://localhost:8000/docs
, or build a simple frontend to upload documents.
3. The Core: LangGraph Workflow 🧠
Think of it as a smart pipeline that processes documents step by step. Here’s what happens:
- You upload documents – like invoices, contracts, medical reports (any format)
- The agent analyzes everything – it looks at all your documents and figures out the best structure
- It creates a unified schema – one JSON schema that can represent all your documents
- Then extracts the data – maps each document to the schema with AI
- Finally builds tables – creates CSV files and structured data you can actually use
Here’s the main workflow:
# The LangGraph workflow
def create_workflow():
workflow = StateGraph(GraphState)
# Add nodes
workflow.add_node("inference_schema", inference_schema)
workflow.add_node("invoice_data_capture", invoice_data_capture)
workflow.add_node("generate_csv", generate_csv)
# Define the flow
workflow.set_entry_point("inference_schema")
workflow.add_edge("inference_schema", "invoice_data_capture")
workflow.add_edge("invoice_data_capture", "generate_csv")
return workflow.compile()
4. Node Classes: Specialized Tools for Every Task 🎯
Inference Schema Node – the schema detective
This is where the magic starts. When you upload documents, this thing:
- Analyzes all your documents – it reads images, PDFs, text files
- Figures out the structure – it creates a unified JSON schema that fits everything
- Handles any document type – invoices, contracts, medical reports, whatever
- Creates a smart schema – with synonyms, field types, and reasoning
def inference_schema(state: GraphState) -> Dict[str, Any]:
# Build multimodal message with all documents
human_message = _build_multimodal_human_message(unstructured_paths)
# Ask the LLM to infer the schema
schema_result = schema_inferencer.invoke({"messages": [human_message]})
# Track everything with Handit.ai
tracker.track_node(
input={"systemPrompt": get_system_prompt(), "userPrompt": user_prompt_summary, "images": image_attachments},
output=inferred_schema,
node_name="inference_schema",
agent_name=agent_name,
node_type="llm",
execution_id=execution_id
)
Invoice Data Capture Node – the data extractor
For each document, it maps the content to your schema:
def invoice_data_capture(state: GraphState) -> Dict[str, Any]:
# Load the inferred schema
inferred_schema = state.get("inferred_schema")
# Process each document
for invoice_path in invoices_paths:
# Create multimodal input (text + images)
messages = [HumanMessage(content=[
{"type": "text", "text": "Map the document to the provided schema..."},
{"type": "image_url", "image_url": {"url": data_url}}
])]
# Extract data using AI
extraction_result = invoice_data_extractor.invoke({
"messages": messages,
"schema_json": schema_json_text
})
# Track with Handit.ai
tracker.track_node(
input={"systemPrompt": get_system_prompt(), "userPrompt": get_user_prompt(), "images": image_attachments},
output=result_dict,
node_name="invoice_data_capture",
agent_name=agent_name,
node_type="llm",
execution_id=execution_id
)
Generate CSV Node – the table builder
Finally, it creates structured tables from all your data:
def generate_csv(state: GraphState) -> Dict[str, Any]:
# Load all the extracted JSON data
all_json_data = []
for json_path in structured_json_paths:
with open(json_path, "r", encoding="utf-8") as f:
json_data = json.load(f)
all_json_data.append({"filename": filename, "data": json_data})
# Ask the LLM to create tables
llm_response = csv_generation_planner.invoke({
"documents_inventory": all_json_data
})
# Generate CSV files
generated_files = _save_tables_to_csv(tables, output_dir)
# Track with Handit.ai
tracker.track_node(
input={"systemPrompt": get_system_prompt(), "userPrompt": get_user_prompt(), "documents_inventory": all_json_data},
output={"tables": tables, "plan": plan, "generated_files": generated_files},
node_name="generate_csv",
agent_name=agent_name,
node_type="llm",
execution_id=execution_id
)
Want to dive deep into the nodes and prompts? Check out the full open-source code!
5. The self-improvement (Best Part)
Here’s the really cool thing – this AI agent actually gets better over time. Here is the secret weapon Handit.ai
Every action, every response is fully observed and analyzed. The system can see:
- Which schema inferences worked well
- Which data extractions failed
- How long processing takes
- What document types cause issues
- When the LLM makes mistakes
- And more…
And yes sir! When this powerful tool detects any mistakes it fixes automatically.
This means the AI agent can actually improve itself. If the LLM extracts the wrong field or generates incorrect schemas, Handit.ai tracks that failure and automatically adjusts the AI agent to prevent the same mistake from happening again. It’s like having an AI engineer who is constantly monitoring, evaluating and improving your AI agent.
To get self-improvement we need to accomplish these steps:
1. Let’s setup Handit.ai observability
This will give us full tracing to see inside our llm’s and tools to understand what they’re doing.
Notice this project comes configured with Handit.ai Observability, you’ll only need to get your own API token. Follow these steps:
1. Create an account here: Handit.ai
2. After creating your account, get your token here: Handit.ai token
3. Copy your token and add it to your .env file:
HANDIT_API_KEY=your_handit_token_here
Once you have accomplish this step, every time you upload documents and process them, you will get full observability in the Handit.ai Tracing Dashboard
2. Set up evaluations
1. Add your AI provider (OpenAI, GoogleAI, etc.) token here: Token for evaluation
2. Assign evaluators to llm’s nodes here: Handit.ai Evaluation
For this project specially assign:
-
Correctness Evaluation to
invoice_data_capture
– this will evaluate the accuracy of data extraction -
Schema Validation to
inference_schema
– this will evaluate the quality of schema inference -
Data Quality to
generate_csv
– this will evaluate the CSV generation quality
3. Set up self-improvement (very interesting part)
1. Run this on your terminal:
npm install -g @handit.ai/cli
2. Run this command and follow the terminal instructions – this connects your repository to Handit for automatic PR creation!:
handit-cli github
✨ What happens next: Every time Handit detects that your AI agent failed, it will automatically send you a PR to your repo with the fixes!
This is like having an AI engineer who never sleeps, constantly monitoring your agent and fixing issues before you even notice them! 🤖👨💻
6. Results
To test the project, first you need to upload some documents (jpg, png, pdf, jpeg) using the API endpoint
First test: Upload 2-3 invoices and let the AI process them
Result: You’ll get:
- A unified JSON schema that fits all your documents
- Structured data extracted from each document
- CSV files with your data organized in tables
Second test: Check Handit.ai dashboard to see the full tracing
Result: You’ll see exactly how the AI processed each document, what prompts it used, and how it made decisions.
Third test: If there are any errors, Handit.ai will detect them and automatically create PRs to fix your agent!
7. Conclusions
Thanks for reading!
I hope this deep dive into building a self-improving AI agent for document processing has been useful for your own projects.
The project is fully open source – feel free to:
🔧 Modify it for your specific document types (receipts, forms, reports, etc.)
🏭 Adapt it to any industry (healthcare, finance, legal, retail, etc.)
🚀 Use it as a foundation for your own AI agents
🤝 Contribute improvements back to the community
Full code open source at: https://github.com/your-username/handit-examples
This project comes with Handit.ai configured. If you want to configure Handit.ai for your own projects, I suggest following the documentation: https://docs.handit.ai/quickstart
What new feature should have this project? Let me know in the comments! 💬
Key Features:
- 🧠 Smart Schema Inference: Automatically creates unified schemas for any document type
- 🔍 Multimodal Processing: Handles images, PDFs, and text files
- 📊 Structured Output: Generates clean JSON and CSV files
- 🚀 Self-Improving: Automatically fixes issues using Handit.ai
- 🛡️ Production Ready: Rate limiting, error handling, and comprehensive logging
- 🔄 LangGraph Workflow: Modern, scalable AI agent architecture
Perfect for:
- Document processing automation
- Data extraction from unstructured sources
- Invoice and receipt processing
- Contract analysis and data extraction
- Medical report processing
- Any document-to-data conversion task