Building my first AI Agent from Scratch

In my last post, I broke down the mental model behind AI Agents: what they are, how they differ from chatbots, and the agent lifecycle (Observe → Think → Plan → Act → Repeat).

That was Phase 1: understanding.

Phase 2 was about building.

I went from “I can explain agents” to “I can build one from scratch.” Here’s what that looked like.

What I Built

A Code Analyzer Agent a simple tool-calling agent powered by Google Gemini.

You give it code. It runs a local tool to analyze it: extracting function names, class names, import counts, nesting depth, complexity hints, and more, then Gemini summarizes the results in plain English.

It follows the basic ReAct pattern:

  1. User input → send to Gemini (with full conversation history)
  2. Gemini decides → should I use a tool, or just respond?
  3. If tool needed → execute locally, send result back to Gemini
  4. Gemini responds → writes a friendly summary using the tool result
  5. If no tool → respond directly
  6. Loop until the user exits

Here’s the GitHub repo if you want to look at the code.

The Architecture

The agent has three pieces:

1. The Reasoning Engine (Gemini)

Gemini acts as the brain. It reads the user’s message, looks at the available tools, and decides “Should I call a function or just answer this myself?”

This is function calling — Gemini doesn’t just generate text, it can generate structured function calls that your code executes.

2. The Tool (analyze_code)

A Python function that takes code as input and returns rich analysis:

{
  "total_lines": 26,
  "blank_lines": 6,
  "comment_lines": 1,
  "import_count": 3,
  "function_names": ["__init__", "load", "process", "helper"],
  "class_names": ["DataProcessor"],
  "has_main_guard": True,
  "longest_function": {"name": "load", "lines": 7},
  "complexity_hints": ["Deeply nested code detected (max depth: 5 levels)"]
}

The tool doesn’t think. It just executes. It uses Python’s re module (regex) to extract names and patterns.

But to make Gemini aware of it, I had to define a function declaration, a schema that describes the tool’s name, what it does, and what parameters it expects:

analyze_code_declaration = types.FunctionDeclaration(
    name="analyze_code",
    description="Analyze a code snippet and return the number of lines, functions, and classes.",
    parameters=types.Schema(
        type=types.Type.OBJECT,
        properties={
            "code": types.Schema(
                type=types.Type.STRING,
                description="The code snippet to analyze",
            ),
        },
        required=["code"],
    ),
)

3. The Orchestrator (main.py)

The loop that ties it all together. Notice how history accumulates every message and how tool results are fed back to Gemini:

history = []  # Conversation memory

while True:
    user_input = input("You: ")

    # Add user message to history
    history.append(types.Content(role="user", parts=[types.Part.from_text(text=user_input)]))

    # Send FULL history to Gemini (not just latest message)
    response = client.models.generate_content(
        model="gemini-3-flash-preview",
        contents=history,
        config=config,
    )

    part = response.candidates[0].content.parts[0]

    if part.function_call:
        name = part.function_call.name
        result = available_tools[name](**dict(part.function_call.args))

        # Add tool call + result to history
        history.append(response.candidates[0].content)
        history.append(types.Content(role="user", parts=[
            types.Part.from_function_response(name=name, response=result)
        ]))

        # Send result BACK to Gemini for a friendly summary
        follow_up = client.models.generate_content(
            model="gemini-3-flash-preview",
            contents=history,
            config=config,
        )
        print(f"Agent: {follow_up.text}")
        history.append(follow_up.candidates[0].content)
    else:
        print(f"Agent: {response.text}")
        history.append(response.candidates[0].content)

What I Actually Learned

1. The LLM is too smart for your tools

This was the biggest surprise. I gave Gemini an analyze_code tool, sent it the code, and it… just analyzed the code itself instead of calling my tool.

Why? Because Gemini looked at the tool (counts lines and functions) and decided it could give a better answer by responding directly.

The fix: System instructions.

system_instruction = (
    "You are a code analysis agent. "
    "When the user provides any code snippet, "
    "you MUST use the analyze_code tool."
)

2. Function declarations are the contract

The FunctionDeclaration isn’t just metadata it’s the contract between your code and the LLM. The description and parameter names directly affect whether Gemini decides to call the tool.

A vague description = Gemini ignores your tool.
A clear description = Gemini uses it correctly.

3. Tool dispatch needs to be dynamic

In my first version, I hardcoded the tool call:

# Bad — doesn't scale
result = analyze_code(function_call.args["code"])

The fix is a simple dict that maps tool names to functions:

available_tools = {
    "analyze_code": analyze_code,
}

# Good — works for any number of tools
result = available_tools[name](**args)

This is exactly how you’d scale to a multi-tool agent.

4. Your agent needs memory

My first version sent only the latest user message to Gemini. It had zero memory. So when I said “What can I improve in the code above?”, Gemini replied “Please provide the code” it genuinely didn’t remember “above.”

The fix: maintain a history list that accumulates every message (user + agent + tool calls). Send the full history to Gemini on every turn. Now follow-up questions just work.

5. Tool results must go back to the LLM

This was the biggest “aha” moment. My first agent loop looked like:

User → Gemini → Tool Call → Print raw result → Done

The tool result was just printed as a raw dictionary. Gemini never saw it! So it could never summarize it or reason about it.

The correct pattern is:

User → Gemini → Tool Call → Run Tool → Send result BACK to Gemini → Gemini responds

This “return trip” is what makes the agent actually useful. Without it, you’re just printing JSON.

6. Yes, 2 API calls when using a tool — and that’s OK

When a tool is used, you make 2 Gemini calls:

  1. Call 1: User message → Gemini decides to call a tool
  2. Call 2: Tool result → Gemini writes a friendly response

This is the standard pattern across every LLM API (OpenAI, Anthropic, Google). The LLM can’t execute your code it can only ask you to run it. So there’s always a round trip.

But when no tool is needed (follow-up questions), it’s only 1 call — Gemini already has the context in the conversation history.

Lesson: Be mindful of LLM round-trips, but don’t over-optimize. The 2-call pattern is the cost of giving your agent autonomy.

What’s Next

Phase 2 gave me a working agent with a single tool, conversation memory, and a complete tool-feedback loop. The foundation is solid.

Next up:

  • Multi-tool agent — adding more tools and letting the LLM choose between them
  • Persistent memory — saving context across sessions (not just within one conversation)
  • Self-evaluation — having the agent check its own output

Building this agent taught me that the core of agentic AI is not the model itself, but the control flow that connects the model, tools, and memory into a coherent reasoning cycle.

🔗 GitHub: DecodersLord/Agentic-AI-Journey

📖 Phase 1 Post: AI Agent VS Chatbot

Leave a Reply