API Gateway vs Service Mesh: Beyond the North–South/East–West Myth

Please note that the page became big because I had questions on my own and less information would have made things look speculatory. You can skip this and read links added at the end of the page, they are very good.

My Experimental Code Link

Like always, if you just read and not code for this, it pretty much becomes as good as not reading it.

Github Link: https://github.com/rajkundalia/api-gateway-service-mesh-sample

This took a long time, I tried implementing a service mesh but it went above my scope – so things like Intentions in Consul would not work.

Introduction: The Misconception That’s Costing Teams

If you’ve worked with microservices, you’ve probably heard this oversimplification: “API Gateways handle north–south traffic, while Service Meshes handle east–west traffic.”

This directional framing has become microservices folklore – repeated in architecture discussions and echoed in conference talks for years.

Here’s the issue: it’s fundamentally wrong.

This misconception leads to poor architectural decisions, unnecessary complexity, and recurring confusion about which technology solves which problem. Teams often reach for an API Gateway when a Service Mesh is what they truly need – or vice versa – because they focus on traffic direction rather than the underlying purpose.

The truth is more nuanced:

  • API Gateways can manage east–west traffic via internal gateways that govern inter-service communication, apply policies, and handle versioning.
  • Service Meshes can handle north–south traffic through mesh-aware ingress gateways (such as Istio’s Ingress Gateway or Linkerd’s ingress controller) that bring external traffic into the mesh.

So if traffic direction isn’t the real difference, what is?

Image

Purpose and responsibility.

An API Gateway treats services as products – with user governance, access control, monetization, lifecycle management, and business context.

A Service Mesh, by contrast, provides infrastructure-level reliability for service-to-service communication – zero business logic, zero product thinking, purely connectivity.

In this article, we’ll cut through the confusion and give you a clear mental model for when to use each technology – or when using both together creates the strongest architecture.

You’ll learn:

  • What problems each technology actually solves (and why traffic direction doesn’t matter)
  • The architectural differences that lead to different use cases
  • How capabilities like mTLS, retries, and zero-trust security define service meshes
  • A practical decision framework for choosing the right tool
  • How API Gateways and Service Meshes complement each other in real-world systems

Let’s start by understanding the fundamental problems each technology was designed to solve.

Understanding the Real Problem Each Solves

API Gateway: APIs as a Product

An API Gateway’s primary purpose is to expose services as managed, consumable APIs – treating your services like products that internal or external consumers can discover, use, and rely on.

But an API Gateway is far more than a reverse proxy. It embeds business logic and enables API composition: aggregating data from multiple services into a single response, transforming payloads, standardizing errors, and presenting a unified interface that shields clients from backend complexity. This is effectively the Backend-for-Frontend (BFF) pattern.

And once you move past request/response mechanics, the real power emerges. API Gateways participate in the entire API lifecycle – the part most developers overlook:

  • Creation & design: specs, versioning, schema validation
  • Testing & documentation: interactive docs, automated tests, sandboxes
  • Publishing & onboarding: developer portals, marketplaces, self-service access
  • Monetization: usage metering, billing hooks, tiered plans
  • Analytics: usage patterns, behavior insights, performance dashboards

This is where the gateway gains business context. It knows concepts like customers, products, API keys, and rate-limit tiers. When a mobile client sends a request, the gateway understands: “This is Acme Corp, a premium tier subscriber, allowed 10,000 requests per hour on the /payments API.”

Modern platforms such as Kong, AWS API Gateway, Azure API Management, Apigee, and Ambassador all embody this philosophy – combining policy enforcement with full lifecycle and product-style API management.

Service Mesh: Service Connectivity Infrastructure

A Service Mesh has a fundamentally different purpose: providing decoupled infrastructure for service-to-service communication without requiring changes to application code.

Service Meshes offload network functions from services into a dedicated infrastructure layer. They handle concerns like service discovery, load balancing, circuit breaking, retries, and timeouts – all the complexity that developers would otherwise implement (and often implement inconsistently) across services.

Critically, Service Meshes have no business logic. They’re purely connectivity and observability infrastructure. A service mesh doesn’t know or care whether it’s routing a payment transaction or a product catalog query. Every service is treated equally as a network endpoint with routing rules and policies.

This enables polyglot architectures. Your Python services, Go services, and Java services all get the same networking capabilities without embedding client libraries or writing language-specific code. The infrastructure handles it transparently.

The key insight: A Service Mesh is business-agnostic. It operates at the infrastructure layer, understanding concepts like “service instances,” “endpoints,” “failure rates,” and “latency percentiles” – but never “customers,” “API products,” or “billing tiers.”

Popular implementations include Istio, Linkerd, Consul Connect, and AWS App Mesh.

Quick Comparison

Aspect API Gateway Service Mesh
Primary Purpose Expose services as managed API products Decouple service communication infrastructure
Context Business-aware (users, products, billing) Business-agnostic (endpoints, metrics)
Logic Can contain transformation, aggregation logic No business logic, pure infrastructure
Lifecycle Scope Full API lifecycle (design → retirement) Runtime connectivity only
Consumer Focus External developers, partners, clients Services communicating with each other

Architecture Deep Dive

Deployment Models

The architectural differences between API Gateways and Service Meshes are stark, and understanding these differences clarifies why each excels at different problems.

API Gateway: Centralized Architecture

An API Gateway deploys as a standalone reverse proxy or clustered front-door, creating a single entry point (or small cluster) for API traffic. It lives in its own architectural layer, distinct from your services.

Here’s a simplified view:

External Clients (Mobile, Web, Partners)
              ↓
    ┌─────────────────┐
    │  API Gateway    │ ← Centralized, clustered for HA
    │   (Kong/AWS)    │
    └─────────────────┘
         ↓    ↓    ↓
    ┌────┐ ┌────┐ ┌────┐
    │Svc │ │Svc │ │Svc │
    │ A  │ │ B  │ │ C  │
    └────┘ └────┘ └────┘

Traffic flows through the gateway as a dedicated hop. The gateway terminates external connections, applies policies, performs routing decisions, and forwards requests to backend services. Deployment is relatively straightforward – you provision the gateway infrastructure separately from your services.

Service Mesh: Decentralized Architecture

A Service Mesh deploys in a fundamentally different way: a sidecar proxy alongside every service replica. This is a decentralized, peer-to-peer model.

Service A          Service B          Service C
┌─────────┐        ┌─────────┐        ┌─────────┐
│  App    │        │  App    │        │  App    │
│Container│        │Container│        │Container│
└────┬────┘        └────┬────┘        └────┬────┘
     │                  │                  │
┌────┴────┐        ┌────┴────┐        ┌────┴────┐
│ Envoy   │◄──────►│ Envoy   │◄──────►│ Envoy   │
│ Sidecar │        │ Sidecar │        │ Sidecar │
└─────────┘        └─────────┘        └─────────┘
       ▲                 ▲                 ▲
       └─────────────────┴─────────────────┘
              Control Plane (Istio/Linkerd)
              (Configuration, not traffic)

Each service instance gets its own proxy (typically Envoy). When Service A calls Service B, the request flows: App A → Sidecar A → Sidecar B → App B. The service code itself doesn’t know about the mesh – it makes standard HTTP or gRPC calls to localhost, and the sidecar handles everything else.

This deployment model is more invasive. It requires modifying your CI/CD pipelines to inject sidecars, updating Kubernetes manifests (or VM configurations), and managing the lifecycle of proxies alongside applications.

Key Insight: In an API Gateway, traffic converges at a central point. In a Service Mesh, traffic flows peer-to-peer between distributed proxies, with the control plane managing configuration but never touching actual requests.

Control Plane vs Data Plane Architecture

This separation of concerns is crucial for understanding Service Meshes, though it applies (less critically) to some API Gateway implementations.

Service Mesh: Deep Dive into Control and Data Planes

The control plane (examples: Istio’s Pilot, Linkerd’s Controller, Consul’s servers) is the brain of the mesh:

  • Configuration management: Distributes routing rules, traffic policies, and service configurations to all sidecars
  • Service discovery: Maintains a live registry of all service instances and their endpoints
  • Certificate authority: Generates and rotates mTLS certificates for service identity
  • Telemetry aggregation: Collects metrics and traces from data plane proxies
  • Policy enforcement setup: Configures access control rules and rate limits

Critically: the control plane is NOT on the request path. It handles configuration and management but never sees actual user requests. This is fundamental to mesh scalability.

The data plane (examples: Envoy sidecars in Istio, Linkerd2-proxy in Linkerd) does the heavy lifting:

  • Handles actual request traffic: Every request flows through data plane proxies
  • Enforces policies: Implements circuit breakers, retries, timeouts configured by control plane
  • L4/L7 routing and load balancing: Makes real-time routing decisions
  • Security enforcement: Performs mTLS handshakes, validates certificates
  • Telemetry generation: Reports metrics, logs, and traces for observability

Let’s make this concrete with service discovery as an example. When Service C scales from 3 to 5 replicas, here’s what happens:

  1. Kubernetes (or your orchestrator) starts two new pods with Service C containers and Envoy sidecars
  2. The Envoy sidecars register with the control plane upon startup
  3. The control plane updates its service registry with the two new endpoints
  4. The control plane pushes updated routing configurations to all Envoy sidecars in the mesh
  5. Within seconds, Service A and Service B know about the new Service C instances and start load balancing across all 5 replicas

No DNS propagation delays. No manual configuration updates. No service discovery libraries in application code. The control plane orchestrates everything, while sidecars handle the actual routing.

API Gateway: Simpler Control Plane Model

Some API Gateway implementations (like Kong with its declarative configuration) have control plane concepts, but the separation is less critical. Many gateways bundle control and data plane functions in the same process. Configuration changes might require gateway reloads, and the gateway itself is on the request path – serving as both traffic handler and configuration enforcer.

Organizational and Deployment Challenges

Service Meshes face unique adoption barriers that API Gateways largely avoid:

1. Universal Sidecar Deployment Requirement

To get value from a service mesh, you need sidecars deployed alongside all services you want to manage. This creates organizational friction: it’s not something a single team can adopt independently. You need buy-in from every service owner.

2. Shared Control Plane Access

All services must share access to the mesh control plane. This crosses security boundaries – teams that previously had isolated deployments now share infrastructure. Organizations with strict security postures find this challenging.

3. Cannot Control External Services

You can only mesh services you directly control. Third-party APIs, legacy systems outside your infrastructure, and managed services like external databases cannot participate in the mesh. This limits where resilience patterns apply.

4. Certificate Authority Coordination

Services in the same mesh must share a Certificate Authority (CA) for mTLS. This requires cross-team coordination on security policies and trust models. Different teams or products often want separate CAs for isolation – which means separate meshes.

Why This Matters: Service mesh adoption is often limited to team or product boundaries. An API Gateway, deployed as central infrastructure, can span the entire organization much more easily. It doesn’t require every team to change their deployment processes.

Now that we understand the architectural differences and deployment realities, let’s examine specific capabilities side-by-side.

Capabilities Comparison

Both technologies offer overlapping capabilities, but with different implementations and tradeoffs. Understanding these differences guides architectural decisions.

Service Discovery

  • API Gateway: Uses external service registries (Consul, Eureka, DNS, Kubernetes Services). The gateway queries the registry to find service endpoints, then routes traffic accordingly.
  • Service Mesh: Built-in service discovery via the control plane. The control plane automatically tracks all sidecar-enabled services, maintaining a live registry without external dependencies. When a service scales or moves, the mesh knows immediately.

Authentication and Authorization ⭐

This is perhaps the most important architectural differentiator between the two patterns.

  • API Gateway: Focuses on user and client identity. Validates API keys, OAuth2 tokens, JWT claims. Answers questions like: “Is this mobile app authorized to call the /payments endpoint?” or “Has this partner exceeded their rate limit?” Security is about edge protection – who gets into your system and what they can access.

  • Service Mesh: Focuses on service identity via mTLS certificates. Every service gets a cryptographic identity. Answers questions like: “Is this really the Payment service calling Fraud Detection?” or “Should Order Service be allowed to communicate with User Profile Service?” Security is about Zero-Trust architecture – no service implicitly trusts another.

Load Balancing

  • API Gateway: Server-side load balancing at the gateway layer. The gateway distributes requests across service instances based on configured algorithms (round-robin, least connections, weighted).
  • Service Mesh: Client-side load balancing distributed via sidecars. Each sidecar makes load balancing decisions locally, using health status and latency information from the control plane. This enables more sophisticated strategies like locality-aware routing (prefer same-zone instances).

Rate Limiting

  • API Gateway: Edge-focused, per-client or per-API-key. Limits like “1000 requests per hour for this developer” or “premium tier customers get 10x capacity.” Centralized enforcement at the gateway.
  • Service Mesh: Can implement distributed rate limiting to prevent service overload. For example, preventing the Notification Service from overwhelming Email Service with requests, regardless of which client triggered the flow. Enforcement happens at sidecars across the mesh.

Circuit Breakers and Retries

  • API Gateway: Configured at the gateway level to protect against downstream service failures. If Payment Service is down, the gateway can circuit break to avoid cascading failures.
  • Service Mesh: Configured at the control plane, enforced at every sidecar. Each service gets automatic circuit breakers and retries without code changes. When Inventory Service calls Warehouse Service and detects failures, the sidecar automatically circuit breaks – no retry logic in Inventory Service code.

Health Checks

  • API Gateway: Gateway actively probes downstream services for health, removing unhealthy instances from its routing pool.
  • Service Mesh: Sidecars monitor local service health and report to the control plane. Passive health checks based on actual request success rates. Faster reaction to failures because the sidecar sits adjacent to the service.

Observability

  • API Gateway: Edge metrics and API-level analytics. Tracks which APIs are called, by whom, how often, and with what latency. Great for understanding API usage patterns and client behavior.
  • Service Mesh: Deep service-to-service metrics and distributed tracing. Tracks every internal call with detailed latency breakdowns, success rates, and request volumes. Enables debugging complex distributed transactions by tracing requests as they flow through multiple services.

Example: When a user checkout fails, the API Gateway shows the client request hit the /checkout endpoint with a 500 error. The service mesh traces reveal that Order Service → Inventory Service succeeded, but Inventory Service → Warehouse Service timed out after 3 retries – pinpointing the exact failure point.

Protocol Support

  • API Gateway: Primarily HTTP/HTTPS, with increasing support for gRPC, WebSockets, and GraphQL. Focused on application-layer protocols.
  • Service Mesh: Supports both L4 (TCP) and L7 (HTTP, gRPC) protocols. Can handle raw TLS connections, TCP traffic, and any IP-based protocol. Broader protocol range because it operates at the network infrastructure layer.

Chaos Engineering and Defect Simulation

  • API Gateway: Limited capabilities – some gateways allow injecting delays or errors, but it’s not a primary feature.
  • Service Mesh: Built-in chaos engineering support. Can inject faults (return 500 errors), add delays (simulate network latency), or abort connections to specific services. Enables testing resilience in production-like conditions. For example, “Make 10% of calls from Order Service to Inventory Service return 503 errors to verify circuit breakers work.”

image

Summary Table

Capability API Gateway Service Mesh
Service Discovery External registry (Consul, DNS) Built-in via control plane
Authentication/Authorization User/client identity (OAuth, API keys) Service identity (mTLS certificates)
Load Balancing Server-side, centralized Client-side, distributed
Rate Limiting Per-client/API key at edge Per-service, distributed
Circuit Breakers At gateway Distributed, no code changes
Health Checks Gateway probes services Sidecars monitor local health
Observability Edge metrics, API analytics Service-to-service tracing
Protocols HTTP/HTTPS, gRPC, WebSockets L4 + L7 (TCP, HTTP, gRPC, TLS)
Chaos Engineering Limited Built-in fault injection

Among these capabilities, mutual TLS deserves special attention because it fundamentally changes how services authenticate and trust each other.

Mutual TLS (mTLS) in Service Mesh

How mTLS Works and Why It Matters

The Mechanism:

When a service mesh is deployed, the control plane includes a Certificate Authority (CA). This CA generates unique, short-lived certificates for every service replica. When Service A’s sidecar calls Service B’s sidecar, both sides present certificates during the TLS handshake, cryptographically proving their identities.

Here’s the flow:

  1. Order Service sidecar initiates connection to Payment Service
  2. Payment sidecar presents certificate: “I am payment.production.svc.cluster”
  3. Order sidecar verifies certificate against the mesh CA
  4. Order sidecar presents its own certificate: “I am order.production.svc.cluster”
  5. Payment sidecar verifies Order’s certificate
  6. Encrypted, authenticated connection established

Crucially, sidecars automatically handle certificate rotation. Certificates might rotate every few hours, and services never see this complexity – it’s entirely transparent.

The Value:

This eliminates the need for service-level authentication code. Previously, Payment Service might check an API key or JWT token to verify the caller. With mTLS, the infrastructure proves identity cryptographically. Your service code doesn’t need to know about authentication – it receives requests that have already been authenticated at the network layer.

Additionally:

  • Encryption by default: All east-west traffic is encrypted, protecting against network sniffing
  • Audit trail: The mesh knows exactly which services communicated with which other services
  • Compliance: Meets requirements for data-in-transit encryption (SOC2, PCI-DSS, HIPAA)

Certificate Authority Boundaries

Services in the same mesh must share a Certificate Authority. This has organizational implications.

Consider a large company with two product teams: Banking and Trading. For security isolation, they want separate Certificate Authorities – Banking services shouldn’t trust certificates from Trading services. This means they need two separate service meshes (Mesh A and Mesh B).

But what if Banking needs to expose APIs to Trading? This is where API Gateways complement service meshes. An API Gateway can sit at the boundary between meshes, terminating mTLS from one mesh and re-establishing it in another mesh (or using traditional API authentication). The gateway bridges different trust domains.

mTLS and Zero-Trust Networking

mTLS enables Zero-Trust architecture for internal service communication.

Traditional security followed the “castle and moat” model: strong perimeter defenses, but once inside the network, services implicitly trusted each other. An attacker who breached the perimeter had free access to internal systems.

Zero-Trust rejects this model: never trust, always verify. Every request, even between internal services, requires authentication. No service is trusted by default, regardless of network location.

Service meshes with mTLS implement Zero-Trust for east-west traffic. Even if an attacker deploys a rogue container inside your cluster, it cannot communicate with legitimate services because it lacks valid certificates signed by the mesh CA. Every service must cryptographically prove its identity on every request.

With these capabilities and security models in mind, let’s turn to practical decision-making: when should you use each technology?

When to Use Each

There’s no one-size-fits-all answer. Choosing between API Gateways and Service Meshes depends on your primary challenge, team maturity, and architectural scale. Let’s build a decision framework.

Decision Framework: Use API Gateway When…

Primary Challenge: External Access & Client Management

If you need to expose services to external consumers – developers, partners, customers, mobile apps – choose an API Gateway. It excels at edge security, client authentication (API keys, OAuth2), and managing the full API product lifecycle.

Concrete scenario: You’re building a SaaS platform where third-party developers integrate with your product catalog API. You need developer onboarding, API key provisioning, documentation portals, usage analytics, and tiered rate limiting. An API Gateway provides all of this out-of-the-box.

Primary Challenge: Service Abstraction & Evolution

If different products or teams need to communicate with governance, versioning, and backward compatibility, choose an API Gateway. It provides abstraction as underlying services evolve.

Concrete scenario: Your mobile team needs stable APIs while your backend undergoes frequent changes. The API Gateway maintains version 1 and version 2 of the /orders endpoint, routing v1 clients to legacy services and v2 clients to the new architecture. Backend teams can refactor without breaking mobile apps.

Primary Challenge: Centralized Control & Simplicity

If you’re starting your microservices journey and need immediate value with lower operational complexity, choose an API Gateway. Simpler deployment, easier to understand, lower barrier to entry.

Concrete scenario: You’re migrating from a monolith to 5–10 microservices. You need request routing, basic rate limiting, and API documentation. A service mesh would be overkill – too much infrastructure overhead for your scale. An API Gateway solves your immediate needs without the operational burden.

Primary Challenge: Edge Security & Rate Limiting

If your main concern is protecting services from external threats and managing API quotas per customer, choose an API Gateway.

Concrete scenario: Your public APIs face potential DDoS attacks, credential stuffing, and abusive clients. The API Gateway implements rate limiting, IP blocking, JWT validation, and anomaly detection at the edge, before traffic reaches your services.

Decision Framework: Use Service Mesh When…

Primary Challenge: Internal Service Reliability

If you have large-scale internal architecture (dozens to hundreds of services) with complex communication patterns, and services need automatic retries, circuit breakers, and timeouts without code changes, choose a Service Mesh.

Concrete scenario: You have 80 microservices across 12 teams. Services frequently fail partially – timeouts, transient errors, network blips. Rather than each team implementing retry logic differently (or not at all), the service mesh provides consistent resilience patterns across all services. When Recommendation Service calls User Profile Service and gets a timeout, the sidecar automatically retries with exponential backoff – no code change needed.

Primary Challenge: Polyglot Environments & Code Elimination

If you want to eliminate networking code from services and need uniform connectivity across services written in different languages, choose a Service Mesh.

Concrete scenario: Your platform includes Python ML services, Go APIs, Java batch processors, and Node.js real-time services. Rather than maintaining four different HTTP client libraries with circuit breakers, retries, and observability, the service mesh provides identical capabilities to all services regardless of language. Developers focus on business logic, not networking infrastructure.

Primary Challenge: Security Compliance & Zero-Trust

If security compliance requires mTLS encryption for all internal communication, or you need Zero-Trust architecture with cryptographic service identity, choose a Service Mesh.

Concrete scenario: Rather than configuring TLS in every service’s application code, the service mesh provides automatic mTLS between all services. Auditors see consistent encryption policies enforced at the infrastructure layer, dramatically simplifying compliance evidence.

Primary Challenge: Deep Observability & Traffic Control

If you require deep east-west observability and distributed tracing across all services, or need advanced traffic management (canary deployments, traffic splitting, A/B testing) for internal services, choose a Service Mesh.

Concrete scenario: You’re rolling out a major refactor of Order Service. You want to send 5% of traffic to the new version, monitor error rates and latency, gradually increase to 50%, then 100%. The service mesh enables this with configuration changes – no deployment changes, no feature flags in code. If error rates spike, you roll back instantly by updating traffic weights.

When NOT to Use Service Mesh

Avoiding Unnecessary Complexity:

Service meshes are powerful but operationally complex. Don’t use them if:

  • Small architectures (< 10–15 services): Operational overhead outweighs benefits. You’ll spend more time managing the mesh than you save from its features.
  • Team lacks infrastructure expertise: Service meshes have a steep learning curve. If your team struggles with Kubernetes basics, adding a service mesh will slow you down.
  • Cannot deploy sidecars: If you depend on external services, legacy systems you don’t control, or third-party SaaS APIs, a service mesh can’t manage those connections.
  • Organizational resistance: Service meshes require cross-team adoption. If teams resist sidecar injection or control plane dependencies, forced adoption fails.
  • Ultra-sensitive performance requirements: Sidecars add latency (typically 1–5ms per hop). For ultra-low-latency scenarios where even milliseconds matter, this overhead is unacceptable.
  • Limited operational resources: Service meshes require dedicated platform engineering resources. If you lack staff to manage mesh infrastructure, troubleshoot sidecar issues, and handle certificate rotation problems, don’t adopt a mesh.

Decision Matrix: Use Both When…

The Comprehensive Approach:

Many mature architectures use both technologies together, leveraging each for its strengths.

Use both when:

  • You need edge control for external clients (API Gateway) AND in-mesh reliability for internal services (Service Mesh)
  • You want API-as-a-product capabilities (documentation, monetization, developer portals) AND Zero-Trust security internally (mTLS between services)
  • You have a mature platform engineering team capable of managing layered infrastructure

Example decision: “We expose our Payment API to mobile apps and partners via API Gateway – handling JWT validation, per-customer rate limiting, and maintaining a developer portal. Internal communication between Payment Service, Fraud Detection Service, and Notification Service uses a service mesh – providing mTLS encryption, circuit breakers, and distributed tracing. The API Gateway itself runs as a service within the mesh, getting the same resilience and observability benefits.”

Real-World Architecture Example

Let’s walk through a financial institution scenario that illustrates how both technologies complement each other.

Scenario: Multi-Product Financial Platform

A financial institution has two major products:

  • Banking Platform (account management, transfers, statements)
  • Trading Platform (stock trading, portfolio management, market data)

Each product has its own engineering team, separate deployments, and independent release cycles. Here’s how they use both technologies:

Service Mesh Deployment (Two Separate Meshes)

  • Banking Mesh: Covers 25 microservices (Account Service, Transaction Service, Statement Generator, etc.) with its own Certificate Authority for security isolation
  • Trading Mesh: Covers 18 microservices (Order Execution, Portfolio Service, Market Data, etc.) with a separate Certificate Authority

Each mesh provides:

  • mTLS encryption for all internal communication within that product
  • Circuit breakers and retries for resilience
  • Distributed tracing to debug complex transactions
  • Zero-Trust security – no service trusts another by default

API Gateway Deployment (Multiple Gateways)

  • Internal API Gateway: Banking Platform exposes select APIs to Trading Platform (e.g., “Get Account Balance” for margin trading). This gateway sits at the boundary between Banking Mesh and Trading Mesh, bridging different trust domains.
  • Edge API Gateway: Both products expose APIs to mobile applications. This gateway handles:
    • JWT validation for user authentication
    • Rate limiting per user tier (retail vs institutional)
    • API versioning (mobile app v1.2 uses older endpoint, v2.0 uses new schema)
    • Developer portal for partner integrations
    • Analytics on API usage patterns

Multi-Datacenter Deployment

The architecture spans two datacenters (DC1 and DC2) for high availability:

  • Each datacenter has full mesh deployment (Banking Mesh and Trading Mesh)
  • API Gateways in each datacenter for local request handling
  • Cross-datacenter mesh communication uses mTLS across the WAN
  • API Gateway load balancers route users to nearest datacenter

Key Architectural Insights:

This architecture demonstrates several principles:

  • Isolation through separate meshes: Banking and Trading use different CAs, preventing accidental trust relationships
  • API Gateways bridge trust domains: Internal gateway mediates between meshes when cross-product communication is needed
  • Layered security: Edge gateway handles user authentication, mesh handles service authentication
  • Different lifecycle management: API versions can change without mesh reconfiguration; mesh policies can change without API versioning

When a mobile user checks their trading portfolio’s buying power, here’s the flow:

  1. Mobile app → Edge API Gateway (JWT validation, rate limiting)
  2. Edge API Gateway → Trading Platform’s Portfolio Service (via Trading Mesh, with mTLS)
  3. Portfolio Service → Internal API Gateway (requesting account balance from Banking)
  4. Internal API Gateway → Banking Platform’s Account Service (via Banking Mesh, with mTLS)
  5. Response flows back through each layer

Each technology layer adds value: the edge gateway protects against external threats and manages API products, while the meshes ensure reliable, secure service-to-service communication.

Pros and Cons Summary

Understanding the tradeoffs helps set realistic expectations and plan for operational challenges.

API Gateway

Pros:

  • Standardizes API delivery: Consistent authentication, rate limiting, and versioning across all APIs
  • Simplifies client integration: Single entry point with unified documentation reduces client complexity
  • High flexibility: Can transform requests, aggregate responses, implement complex routing logic
  • Easier adoption: Centralized deployment model requires less organizational coordination
  • Centralized analytics: Single place to monitor API usage, client behavior, and performance trends
  • Legacy integration: Can front legacy systems, providing modern API interfaces to old infrastructure

Cons:

  • Single point of failure risk: Though clustering mitigates this, the gateway remains a critical chokepoint
  • Centralization complexity at scale: As more APIs are added, gateway configuration grows complex
  • Latency introduction: Extra hop adds latency (typically 5–20ms depending on gateway processing)
  • Limited internal visibility: Only sees edge traffic, not service-to-service communication patterns
  • Scaling challenges: While horizontal scaling is possible, it’s more complex than distributed architectures

Service Mesh

Pros:

  • Built-in observability: Comprehensive metrics, distributed tracing, and logging without code instrumentation
  • Enhanced security: Automatic mTLS, Zero-Trust architecture, cryptographic service identity
  • Resilience without code: Circuit breakers, retries, timeouts configured centrally, enforced everywhere
  • Fine-grained traffic control: Canary deployments, traffic splitting, A/B testing at infrastructure level
  • Chaos engineering capabilities: Inject faults and delays to test system resilience
  • Abstracts networking from code: Developers focus on business logic, not HTTP clients and retry libraries
  • Language agnostic: Same capabilities for Go, Python, Java, Node.js services

Cons:

  • Steep learning curve: Complex architecture requires dedicated platform engineering expertise
  • Operational complexity: Managing control plane, certificate rotation, sidecar upgrades adds operational burden
  • Latency overhead: Each sidecar hop adds latency; multiple hops compound this
  • Resource overhead: Memory and CPU per sidecar
  • Requires infrastructure maturity: Best suited for Kubernetes environments with GitOps practices
  • Organizational challenges: Requires cross-team adoption and coordination – can’t be implemented in isolation
  • Deployment complexity: Sidecar injection, control plane dependencies increase deployment complexity

Conclusion

Let’s return to where we started: the pervasive north-south/east-west myth that frames API Gateways and Service Meshes as mutually exclusive technologies defined by traffic direction.

This framing is fundamentally flawed. Both technologies can handle both traffic types. API Gateways can manage internal service-to-service communication through private gateways. Service Meshes can expose external traffic through ingress gateways. The real distinction has nothing to do with where traffic flows.

What actually matters is purpose:

  • API Gateways treat services as products with business context – managing full API lifecycles, understanding users and customers, handling monetization and developer onboarding. They operate at the application edge with business awareness.
  • Service Meshes provide business-agnostic infrastructure for service connectivity – offloading networking concerns from application code, enabling Zero-Trust security through mTLS, and providing deep observability without instrumentation. They operate at the infrastructure layer with no business logic.

Looking forward, both patterns continue to evolve. Service Meshes are simplifying operationally (Linkerd’s focus on simplicity, Istio’s ambient mesh reducing sidecar overhead). API Gateways are adding mesh-like features (Kong Mesh, Ambassador’s service mesh integration). The boundaries blur, but the fundamental purposes remain distinct.

Choose your tools based on the problems they solve, not the traffic patterns they handle. Your architecture – and your team’s sanity – will thank you.

Note

Obviously this content has been generated by LLM, but my approach to writing has been the following:

  1. I read topics from various pages out there.
  2. I come across questions/sub topics that I would want to cover.
  3. I add this questions/subtopics and then generate using LLM.
  4. I read the LLM generated content and then keep what I find necessary.

Links

Leave a Reply