This is a submission for the Google AI Agents Writing Challenge: Learning Reflections

From Prompts to Action: My Journey Through the Google & Kaggle AI Agents Bootcamp

As someone who has watched AI evolve from “magic black box” to “everyday tool,” I often felt a barrier between using AI and building with it. Aside from a chatbot what else could AI do and how do I harness the power of AI? I thought building agents required a PhD in Machine Learning. This week, the 5-Day AI Agents Intensive Course by Google and Kaggle completely shattered that illusion.

It turns out, if you can write a Python function, you can build an agent. Here is my deep dive into the code, the concepts, and the tools that made this journey accessible, featuring my capstone project: Jarbest.

The Awakening: Hello, Agent

As coming from non-developer background, I always imagined software as a “bricklayer”—rigidly following a blueprint. Day 1 introduced me to the Agent: a system that acts more like a film director. It doesn’t just predict text; it has a “Brain” (the model), “Hands” (tools), and a “Nervous System” (orchestration) to autonomously perceive, reason, and act. We learned that an agent operates in a continuous loop—Mission, Scan, Think, Act, Observe—constantly adapting its plan to solve problems. This framework demystified the magic: I wasn’t just coding a chatbot; I was building a system with the agency to execute multi-step missions.

The “Aha!” Moment: It’s Just Python & The “USB Port” for AI

Day 2 was a revelation. I learned that Models are just “Brains”—pattern predictors that cannot see or act. To be useful, they need Tools: the “Eyes” and “Hands” that let them fetch data or execute actions.

But connecting every tool to every model is a nightmare (the N x M problem). Enter the Model Context Protocol (MCP).

Think of MCP as the USB port for AI. Before USB, you needed a specific cable for every device. MCP lets you plug any tool into any agent using a standard connection.

The Code: Giving the Agent “Hands”

In my project, Jarbest (an accessible personal companion), I needed an agent that could check bank account balances. instead of writing a custom connector, I used MCP to “plug in” a secure banking server.

# Finance Agent: Manages the banks transactions
finance_agent = Agent(
    name="finance_agent",
    description="An agent that can help with banking operations like checking balances...",
    # Assuming; This toolset connects to a secure internal banking server
    tools=[
        MCPToolset(
            connection_params=StreamableHTTPConnectionParams(
                url=f"{BANK_MCP_URL.rstrip('/')}/mcp",
            )
        )
    ],
)

Why this matters (and the danger):
The agent reads these tool definitions and knows exactly when to use them. If a user asks “Can I afford this pizza?”, the agent inherently knows it must first call check_balance.

However, the notes warned us: Using MCP is like plugging in a random USB drive found on the street. It could be a legitimate tool, or it could be a “Tool Shadow” (a malicious copy). That’s why in Jarbest, I implemented a strict Application-Layer Gateway (via hardcoded allowlists)—ensuring the agent can only connect to my specific, internal MCP banking server, preventing it from ever “plugging in” to an untrusted source

Deep Dive: The Brain (Memory)

Day 3 was where things got sophisticated. A chatbot forgets you the moment you close the tab. An agent remembers.

For Jarbest, which is designed for elderly users who value consistency, memory is critical. If “Grandma Jane” asks for her “usual order,” the agent shouldn’t ask “What is that?”; it should know.

Here is how I implemented the “Brain” in my root agent:

root_agent = Agent(
    name='root_agent',
    instruction="""
    You are Jarbest...
    Memory: Use the load_memory tool to recall past conversations and preferences 
    (e.g., "ordering the usual").
    """,
    tools=[load_memory], # <--- This single line gives the agent a "brain"
    after_agent_callback=auto_save_to_memory # Auto-saves every interaction
)

The Non-Developer Perspective: Think of load_memory like giving the agent a filing cabinet. When Grandma Jane says “Order me some food,” the agent thinks: “I need to check if she has a preference,” opens the cabinet (load_memory), finds “Likes Large Pepperoni Pizza,” and acts on it. Watching this thought process in real-time was mind-blowing.

The “Squeeze”: Debugging the Black Box

Day 4 taught us that “it works” isn’t enough. You need to know why it works. When building a safety-focused agent like Jarbest, I couldn’t afford “hallucinations.”

Exploring the Agent Observability labs, I learned to trace the agent’s reasoning steps. When my agent refused to order a pizza, I could look at the trace and see:

User: “Order a pizza.”
Tool Call: check_balance -> returned $5.00.
Reasoning: “Pizza costs $20. User has $5. Result: Unsafe.”
Response: “I cannot complete this order because your balance is too low.”

Seeing that raw reasoning log felt like looking into the matrix. It transformed the LLM from a mysterious oracle into a logical, debuggable software component. I realized I wasn’t just “prompting” anymore; I was engineering logic.

The Ecosystem: Agents Talking to Agents (A2A)

Day 5 introduced the Agent-to-Agent (A2A) Protocol. This is where I moved from building a single assistant to building a team.

My “Purchaser Agent” doesn’t know how to make pizza. Instead, it connects to a completely separate “Pizza Shop Agent” (simulating a 3rd party vendor).

# Creating a client-side proxy for a remote agent
pizza_agent_proxy = RemoteA2aAgent(
    name="pizza_agent",
    # The "Agent Card" acts like a business card for discovery
    agent_card="http://localhost:10000/.well-known/agent-card.json",
    description="Remote pizza agent from external vendor...",
)

purchaser_agent = Agent(
    name="purchaser_agent",
    instruction="Your goal is to help the user find and buy items.",
    tools=[AgentTool(pizza_agent_proxy)], # <--- Treating another agent as a tool
)

The Cool Idea: The “Agent Card” isn’t just a technical manifest; it’s a completely new way for businesses to interact.

For SMBs (Small to Medium Businesses): Instead of constantly maintaining and documenting complex APIs for developers to read, you simply publish an “Agent Card” (like a digital business card) that describes what your service does (e.g., “I sell pepperoni pizza”).
For Developers: It saves massive amounts of time. My support agent just reads this card and instantly knows how to ask for a customized order.
The Future: This allows agents to communicate autonomously, representing the transaction of each individual without human friction. It’s like an API that reads itself.

Capstone Spotlight: Jarbest – Agents for Good

Applying these concepts, I built Jarbest for the “Agents for Good” track.

The Problem: The digital world is full of dark patterns and complex UIs that exploit vulnerable users, especially the elderly.
The Solution: A unified “Action Space” that replaces app sprawl.
Jarbest eliminates the friction of switching between banking apps, delivery apps, and websites. Instead of forcing Grandma Jane to install and navigate ten different confusing interfaces, Jarbest uses Tools, MCP, and A2A to communicate with these services directly on her behalf.

The Result: Simplicity and Safety. By centralizing these actions into one verified conversation, we inherently protect the user. They no longer need to open browsers or install random apps where they might fall victim to phishing sites or fake download buttons. Jarbest acts as the safe, validated operational layer for their digital life.

Jarbest uses a Hierarchical Architecture:

Guardian (Root Agent): The “Thinking” layer. It never touches money directly. It validates safety.
Auditor (Finance Agent): The only agent with access to the MCP banking server (via MCP).
Doer (Purchaser Agent): The logistics layer that talks to vendors (via A2A).

This separation of concerns ensures that even if the “Doer” gets confused, the “Guardian” prevents any financial mistakes.

Check out the Code Repository Here

Why Multi-Agent Architecture?

Working with a single “god agent” that handles everything creates a bottleneck. It forces one model to juggle complex reasoning (safety checks, intent parsing) with mundane execution (API calls, order formatting), leading to context overflow and hallucinations.

By breaking the system into specialized sub-agents, we achieve:

Reduced Cognitive Load: The Root Agent focuses purely on orchestration and safety, while the Purchaser Agent focuses solely on logistics.
Efficiency: We can route simple tasks to faster, cheaper models (Gemini 2.5 Flash) and reserve the powerful reasoning models (Gemini 3 Pro) for the Guardian role.
Scalability: New vendors (e.g., a Pharmacy Agent) can be added as new tools for the Purchaser Agent without retraining or complicating the Root Agent’s logic.

My Secret Weapon: NotebookLM

The course came with dense whitepapers—goldmines of information on “Context Engineering” and “Agent Quality.” But digesting 20-page PDFs can be daunting.

My Workflow:

Feed the Brain: I downloaded the Context Engineering whitepaper and uploaded it directly to NotebookLM.
The Conversation: Instead of reading linearly, I interrogated the text.
- Me: “Explain the trade-offs between vector databases and keyword search for agent memory.”
- NotebookLM: It synthesized the answer specifically from the whitepaper, citing the exact page numbers.
The Podcast: I used the “Audio Overview” feature to generate a podcast of the whitepaper. I listened to two AI hosts debate the merits of “Session” vs. “Memory” while I cooked dinner. It turned homework into entertainment.

What I’d Do Differently (The Roadmap)

Building Jarbest in just 5 days was a sprint, and I left plenty of ideas on the cutting room floor. If I had another week, here is what I would tackle:

Dynamic Tool Loading: Instead of hardcoding tools, I’d want the agent to “discover” new MCP servers on the local network automatically.
Voice Interface: Accessibility is key for my target audience (elderly users). Adding a voice layer on top of the text interface would be a game-changer.
Proactive Alerts: Currently, the agent waits for input. I want to build a background loop where it can nudge the user: “Hey, you usually order groceries on Tuesday. Should I do that?”

Conclusion

This bootcamp didn’t just teach me syntax; it fundamentally shifted my mental model of software development. I went from viewing AI as a passive chatbot to seeing it as a dynamic, composable ecosystem of “Doers”.

The combination of accessible frameworks like the Google GenAI SDK, standardized protocols like MCP, and powerful reasoning models has truly democratized agency. You don’t need a research lab or a PhD to build systems that perceive, reason, and act—you just need a clear mission and the curiosity to prompt it.

If you’ve been on the fence about diving into AI Agents, now is the time to start. The tools are ready and the barrier to entry has never been lower. I can’t wait to see what you build.