Why Your MCP Server Needs Its Own Logging — Not Just Claude Desktop’s

Building a unified observability dashboard that tracks every AI agent action across cloud and local — with SQLite, FastAPI, and Streamlit

The Invisible Problem with AI Agents
When you ask an AI agent to “check my calendar and send an email,” it feels like a single action. Behind the scenes, it’s a chain of 5–10 tool calls: authenticate, fetch events, parse results, compose draft, send via SMTP. Each step can fail silently.

In my previous articles, I built a Hybrid MCP Agent that controls both cloud APIs (Gmail, Salesforce, Google Calendar) and local filesystem operations (scanning folders, moving files, generating reports). The architecture worked. But I had zero visibility into what the agent was actually doing.

When something broke, I had no idea where to look.

This is the observability gap in AI agent systems. Traditional application monitoring tools like Datadog or New Relic aren’t designed for MCP tool-call chains. And when your agent operates across two completely different environments — a GCP VM and a Windows desktop — the challenge multiplies.

Here’s how I solved it.

Architecture: Two Data Streams, One Database
The core design challenge was unifying logs from two fundamentally different environments:

Remote MCP Server (GCP VM): Handles Gmail, Salesforce, Calendar via cloud APIs
Local MCP Server (Windows PC): Handles filesystem operations via Desktop Commander
These operate on different machines, different OS, different transport protocols. But I needed them in a single queryable store.

The Pipeline

Part 1: Remote Logging with FastMCP Middleware
“But why not just parse the Desktop logs?”
Fair question. Claude Desktop already logs every MCP interaction — including remote tool calls — to mcp-server-cloud-agent.log. In fact, that’s exactly how local logs get collected in Part 2. So why add server-side middleware at all?

Five reasons:

  1. The Desktop isn’t always the client. MCP servers can be called by any client — API integrations, other agents, scheduled jobs, or multiple users hitting the same endpoint. Desktop logs only capture what your Claude Desktop session did. The middleware captures everything that hits the server, regardless of the caller.

  2. Multi-user visibility requires a central log. In production, a single MCP server serves multiple users with different roles — admin, sales, finance — each routed via /mcp?user_id=role. Each user’s Claude Desktop only logs their own session. Without server-side logging, answering “which user triggered the Salesforce API rate limit at 2 PM?” requires collecting logs from every user’s laptop. The middleware records all users’ activity in one place, tagged with user_id, making cross-user analysis trivial.

  3. Enterprise operations demand a unified view. Scale this to a team or department: 10, 50 users sending emails, updating CRM records, and generating documents through the same MCP server. An ops manager needs to answer: “What’s our total API usage today?”, “Which tool fails most often?”, “Is any user making abnormally high call volumes?” Collecting Desktop logs from every individual machine doesn’t scale. Server-side middleware feeding a unified dashboard solves this enterprise operations requirement from day one.

  4. Latency accuracy. Desktop logs record timestamps when the client sends the request and receives the response. The duration includes network round-trip time. Server-side middleware measures actual tool execution time — the difference matters when diagnosing whether a slow call is a network issue or a tool performance issue.

  5. Operational independence. In production, your monitoring shouldn’t depend on a developer’s laptop being online and running the uploader script. The middleware writes directly to the database in real-time, with zero dependency on any external component. If the local uploader crashes, goes offline, or misses a cycle — remote logs are still complete.

In short: Desktop log parsing is a workaround for environments you don’t control (npm packages like Desktop Commander). Server-side middleware is the proper instrumentation for environments you do control.

For the cloud-side MCP server, I used FastMCP’s middleware system to intercept every tool call automatically. No changes to business logic required.

The Middleware
from fastmcp.server.middleware import Middleware, MiddlewareContext
class LoggingMiddleware(Middleware):
async def on_call_tool(self, context: MiddlewareContext, call_next):
tool_name = context.message.name
start_time = time.time()

    log_data = {
        "timestamp": datetime.utcnow().isoformat() + "Z",
        "source": "remote",
        "tool_name": tool_name,
        "parameters": context.message.arguments or {},
        "success": True,
    }

    try:
        result = await call_next(context)
        log_data["duration_ms"] = (time.time() - start_time) * 1000
        log_data["result_summary"] = summarize_result(result)
        return result
    except Exception as e:
        log_data["success"] = False
        log_data["error_message"] = str(e)
        raise
    finally:
        log_db.insert_log(log_data)

The key design decision: the middleware wraps call_next(), capturing both success and failure cases in a single finally block. This guarantees every tool call gets logged, even if it throws an exception.

SQLite Schema
cursor.execute(“””
CREATE TABLE IF NOT EXISTS tool_logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
source TEXT NOT NULL DEFAULT ‘remote’, — ‘remote’ or ‘local’
user_id TEXT,
tool_name TEXT NOT NULL,
parameters TEXT,
success INTEGER NOT NULL,
error_message TEXT,
duration_ms REAL,
result_summary TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
)
“””)
Why SQLite over PostgreSQL or a managed database? For a single-VM MCP deployment, SQLite is the pragmatic choice: zero configuration, no separate process, and more than sufficient for the volume of tool calls an AI agent generates. The entire database is a single file — which, as you’ll see, turned out to be both a feature and a footgun.

Part 2: Local Log Collection Pipeline
The local side was trickier. Desktop Commander (the local MCP server) runs as an npm package — I can’t inject middleware into it. But Claude Desktop already logs every MCP interaction to disk.

Log Parser
Claude Desktop writes structured logs to %APPDATA%Claudelogsmcp-server-*.log. Each tool call produces a request-response pair:

2026-02-18T08:46:22Z [local-commander] Message from client:
{“method”:”tools/call”,”params”:{“name”:”move_file”,”arguments”:{…}}}

2026-02-18T08:46:22Z [local-commander] Message from server:
{“jsonrpc”:”2.0″,”id”:5,”result”:{…}}
The parser matches request-response pairs by JSON-RPC id, calculates duration, and extracts tool metadata:

def parse_tool_call_request(line: str) -> Optional[Dict]:
if ‘”method”:”tools/call”‘ not in line:
return None

timestamp = extract_timestamp(line)
json_match = re.search(
    r'Message from client: ({.*?"id":d+})', line
)
msg = json.loads(json_match.group(1))
params = msg.get('params', {})

return {
    'timestamp': timestamp,
    'tool_name': params.get('name'),
    'arguments': params.get('arguments', {}),
    'request_id': msg.get('id'),
}

Incremental Upload with Bookmarks
A critical design choice: the uploader tracks its last read position per file, so it only sends new logs on each run.

last_position.json

{
“mcp-server-local-commander.log”: 1724902,
“mcp-server-local-agent.log”: 2326881,
“mcp-server-dev-agent.log”: 1722778,
“mcp-server-cloud-agent.log”: 2057283
}
Each 5-minute cycle: read from bookmark → parse new lines → POST to GCP → update bookmark. No duplicates, minimal bandwidth.

with open(filepath, ‘r’, encoding=’utf-8′) as f:
f.seek(last_position) # Resume from last read
for line in f:
# parse request/response pairs…
new_position = f.tell() # Save new bookmark
Receiver API
On the GCP side, a FastAPI endpoint receives batches of local logs:

@router.post(“/logs/upload”)
async def upload_logs(request: LogUploadRequest):
logs_data = [{
**log.dict(),
“source”: “local” # Tag as local origin
} for log in request.logs]

count = log_db.insert_logs_bulk(logs_data)
return {"status": "success", "uploaded_count": count}

The source field is the unifier — it’s what lets the dashboard filter and compare remote vs. local activity side by side.

Part 3: The Dashboard
With both streams flowing into a single SQLite database, the Streamlit dashboard provides real-time operational visibility:

Summary cards: Total calls, success rate, average response time, error count
Hourly call volume chart: Stacked by source (remote vs. local)
Tool statistics table: Call count, success rate, average duration per tool
Filterable log table: By time range, source, status, tool name, keyword search
Detail view: Full parameters and error messages for any individual log entry
The filtering is where it gets practical. When debugging, I typically start with: “Show me failed calls in the last hour” → drill into the specific tool → read the error message and parameters. What used to require SSH-ing into the VM and grep-ing through container logs now takes 3 seconds.

Part 4: The Bug That Silently Deleted My Data
Here’s where it gets interesting — and where the real operational lesson lives.

Join The Writer’s Circle event
After running this system for a week, I noticed something strange: the dashboard only showed recent logs. Anything older than a day or two was gone. Remote logs from yesterday? Vanished. Local logs I’d uploaded that morning? Still there, but yesterday’s batch had disappeared.

I initially suspected the retention cron job was too aggressive. But the cron only purges data older than 30 days. Then I checked the deploy pipeline.

The Root Cause
My GitHub Actions deployment script had this sequence:

Step 5: Sync source code for dashboard

cd ~
sudo rm -rf ai_mcp_fastmcp_remote # 💀 THIS LINE
git clone https://…ai_mcp_fastmcp_remote.git
The SQLite database lived at ai_mcp_fastmcp_remote/logs/mcp_logs.db. Every deployment nuked the entire project directory — including the database — then cloned a fresh copy from Git. The logs folder would be recreated empty, and the system would start accumulating data from scratch… until the next deployment destroyed everything again.

This is a classic infrastructure anti-pattern: treating stateful data the same as stateless code. The deployment script assumed everything in the project directory was reproducible from Git. The database was not.

The Fix: Separate Data from Code
The solution was to move persistent data outside the blast radius of deployments:

Before: DB inside project (destroyed on deploy)

-v /home/user/ai_mcp_fastmcp_remote/logs:/app/logs

After: DB in dedicated persistent directory

-v /home/user/mcp_data/db:/app/data/db
-v /home/user/mcp_data/chromadb:/app/data/chromadb
I also discovered that our ChromaDB vector store (used for document Q&A) had the same vulnerability — it had no volume mount in the docker run command at all. Every deployment silently wiped the entire knowledge base.

The code change was minimal — just updating the path resolution:

DB path: prefer mounted volume, fallback to local

DB_DIR = (
Path(“/app/data/db”)
if Path(“/app/data/db”).exists()
else Path(file).parent.parent / “data” / “db”
)
DB_PATH = DB_DIR / “mcp_logs.db”
Final directory structure on the VM:

~/mcp_data/ # Persistent (survives deployments)
├── db/mcp_logs.db # SQLite tool call logs
└── chromadb/ # Vector store for documents
~/ai_mcp_fastmcp_remote/ # Ephemeral (rebuilt each deploy)
├── mcp_server/ # Application code
├── dashboard.py # Streamlit app
└── logs/mcp_tools.jsonl # Append-only log file (regenerable)
Key Takeaways

  1. AI agents need observability, not just logging. Writing logs to a file isn’t enough. You need queryable, filterable, cross-environment visibility with sub-second drill-down. When an agent chains 8 tool calls and #6 fails, you need to see the full sequence instantly.

  2. Middleware is the right abstraction for MCP logging. FastMCP’s middleware pattern lets you capture every tool call without modifying any business logic. One class, registered once, covers all current and future tools.

  3. Separate stateful data from stateless code. This is DevOps 101, but it’s easy to forget when your “database” is just a SQLite file sitting in your project directory. If rm -rf can destroy it, it’s in the wrong place.

  4. The bookmark pattern prevents duplicate uploads. For any log forwarding pipeline, tracking file read positions is simple and effective. It handles the common case (incremental new data) and the edge case (file rotation) cleanly.

  5. Start with SQLite. For single-node MCP deployments, SQLite is the right choice. No infrastructure overhead, ACID compliance, and you can always migrate to PostgreSQL later if you actually need it. In my case, the entire tool call history for months fits comfortably in a single file under 10MB.

What’s Next
The logging system revealed something I didn’t expect: the AI agent makes mistakes more often than I assumed. Not catastrophic failures — subtle ones. Wrong file paths, redundant API calls, inefficient tool selection.

In the next article, I’ll share what I learned from analyzing these logs about the real limits of AI agent autonomy — and why “human-in-the-loop” isn’t just a safety feature, it’s a performance optimization.

The full implementation is demonstrated in video on SunnyLab TV.

All code referenced in this article is from a production MCP system running on GCP. Find me on Medium for the previous articles in this series.

Tags: Artificial Intelligence, Software Engineering, DevOps, Python, Cloud Computing

Leave a Reply