Why We Stopped Recommending ZITADEL for Self-Hosting: A Developer’s Field Report

TL;DR

We chose ZITADEL for our multi-tenant ERP because it promised native multi-tenancy, an event-sourced architecture, and an API-first design in Go. What we got instead was deprecated APIs that return 404 without warning, a v2 user API that silently skips email verification, a JavaScript engine stuck in 2009, a login system that was ripped out into a separate container between versions, and documentation that reads like it was written for the managed cloud product – not self-hosters. After weeks of debugging, we document every issue, evaluate the alternatives, and explain why ZITADEL’s self-hosted developer experience is not production-grade.

Who We Are and Why We Chose ZITADEL

We are building AmanERP – a multi-tenant, web-based Cloud ERP system designed for complex authorization patterns: RBAC, ABAC, and ReBAC via OpenFGA (Google Zanzibar). Our authentication requirements:

  • Multi-tenancy: Organization-level isolation with per-tenant branding
  • OIDC/OAuth2: Standard token flows for API gateway integration
  • Social Login: Google OAuth with per-tenant configuration
  • Email Verification: Transactional emails for user onboarding
  • MFA: Time-based OTP and WebAuthn
  • Self-hosted: Full data sovereignty, no SaaS dependency
  • API-first: Programmatic management of all identity operations

ZITADEL checked every box on paper. Written in Go (matching our backend), native multi-tenancy via Organizations, event-sourced (audit-friendly), and a comprehensive Admin/Management API.

We were wrong to trust the marketing.

Zitadel Integration Friction

The Issues: A Chronological Disaster Log

What follows is not speculation. These are real issues encountered during production hardening of ZITADEL v4.10.1, documented with timestamps, API calls, and root causes.

Issue 1: The Login UI That Vanished (Severity: Critical)

What happened: After upgrading to ZITADEL v4, the login page returned 404. Users could not authenticate.

Root cause: ZITADEL v4 fundamentally restructured its architecture. The login UI is no longer embedded in the main container. It is now a separate Next.js application (zitadel-login) that must be deployed as its own service.

What this means operationally:

  • Two containers instead of one (zitadel:v4.10.1 + zitadel-login:v4.10.1)
  • Shared volume (/zitadel-data) requiring user: "0" (root) on both containers – a security trade-off forced by distroless images
  • PAT (Personal Access Token) exchange between containers via file-based handshake
  • Login runs on port 3000 while the API runs on port 8080 – two different hosts to configure, proxy, and monitor

This was not a minor upgrade. This was an architectural rewrite that broke every existing deployment.

Issue 2: The BASEURI Double-Path Bug (Severity: High)

What happened: Email verification links sent to users contained doubled paths: /ui/v2/login/ui/v2/login/verify.

Root cause: ZITADEL’s defaultBaseURL() function unconditionally appends /ui/v2/login to whatever BASEURI you configure. If you include that path in your configuration (which the documentation examples suggest), you get double paths.

The fix: Set BASEURI to the bare host only (http://localhost:3000), not the full login path. But we only discovered this by reading ZITADEL’s Go source code – the documentation does not mention this behavior.

Compounding problem: The ZITADEL_DEFAULTINSTANCE_FEATURES_LOGINV2_BASEURI environment variable only works at first initialization. Post-initialization changes require calling the Features V2 API (PUT /v2/features/instance). This is not documented anywhere in the self-hosting guide.

Issue 3: The Deprecated API That Returns 404 (Severity: High)

What happened: Our setup script called GET /admin/v1/email/smtp to check if an SMTP provider existed. It returned 404. We concluded no provider existed and created one. On next run, it returned 404 again. We created another. After five runs, we had five duplicate SMTP providers.

Root cause: GET /admin/v1/email/smtp is silently deprecated in ZITADEL v4.10.1. It returns 404 not because the provider does not exist, but because the endpoint itself no longer functions. There is no deprecation header, no warning in the response, no mention in the changelog. The endpoint simply breaks.

The correct v4 endpoints are:

  • GET /admin/v1/email (active provider)
  • POST /admin/v1/email/_search (list all providers)
  • PUT /admin/v1/email/smtp/{id} (update)

We discovered this through trial and error, database inspection, and eventually finding an obscure GitHub issue. The official API documentation still lists the deprecated endpoint.

Issue 4: The Email That Never Sent (Severity: High)

What happened: We created a user via the v2 API with email.isVerified: false, expecting ZITADEL to send a verification email. No email was sent. The SMTP provider was correctly configured and active.

Root cause: In ZITADEL’s v2 User API, setting isVerified: false does not trigger a verification email. It simply marks the email as unverified and does nothing else. To actually send a verification email, you must explicitly include email.sendCode: {} in the creation request.

The debugging journey:

  1. Checked SMTP provider – active and correctly configured
  2. Queried the event store – found user.human.added events but no email.code.added events
  3. Inspected the notification handler projection – position appeared “stuck” (it was not stuck; it only advances on notification-triggering events, of which there were none)
  4. Read ZITADEL source code to discover the sendCode requirement

This cost us 4+ hours of debugging. The v2 API documentation does not clearly state that isVerified: false is a passive flag, not an action trigger.

Issue 5: The Actions Runtime That Changed Shape (Severity: High)

What happened: We wrote ZITADEL Actions (custom logic hooks) for token enrichment and user lifecycle events. Actions v1 ran inline JavaScript inside ZITADEL’s Goja runtime. Actions v2 replaced inline execution entirely with an external webhook model – your custom logic now runs as a separate HTTP service that ZITADEL calls out to.

Root cause: The migration from Actions v1 to v2 is a fundamental model change, not an incremental upgrade. In v1, your JavaScript ran inside ZITADEL’s process with access to a limited set of built-in modules. In v2, there is no inline JavaScript at all – you deploy a webhook endpoint and ZITADEL sends HTTP requests to it. This means rewriting all custom logic as external services, adding network round-trips to every login flow, and accepting a new failure mode (your webhook endpoint being unavailable). Community feedback on this transition is tracked in GitHub issue #10316.

The insidious part: In Actions v1, ZITADEL did not validate JavaScript at creation time. Actions appeared “ACTIVE” and healthy until a user actually triggered them. Runtime error messages (“Errors.Internal”) gave no indication of what went wrong – no line numbers, no syntax details, no actionable debugging information.

Issue 6: The Login Callback Format That Changed (Severity: Medium)

What happened: Google OAuth social login failed. Users were redirected to a callback URL that returned 404.

Root cause: ZITADEL Login V2 changed the social IdP callback URI format:

  • v1 (old): /ui/login/login/externalidp/callback
  • v2 (current): /idps/callback

Google’s OAuth configuration requires exact redirect URI matching. The old format that worked in v1/v2/v3 silently breaks in v4.

Issue 7: The Post-Verification Dead End (Severity: Medium)

What happened: After a user set their initial password through the verification flow, they landed on a success page with no navigation. No redirect to the application. No “continue” button. Users were stranded.

Root cause: ZITADEL’s defaultRedirectUri in the login policy is not set by default. After password initialization, ZITADEL shows a static success page. The fix requires calling PUT /admin/v1/policies/login with a defaultRedirectUri – but this option is not mentioned in the default setup guides.

The Pattern: Why ZITADEL Is Brittle

These are not isolated bugs. They reveal systemic problems:

1. API Versioning Without a Contract

ZITADEL has had three major API generations (v1, v2, v3-alpha-then-reverted) in six years. The project plans to deprecate all V1 APIs in V5. Developers have reported that they cannot determine which APIs are GA, Beta, or Deprecated in which version. In one documented case, deprecated API docs pointed to another API that was also deprecated – circular migration confusion.

2. Silent Failures Over Loud Errors

Deprecated endpoints return 404 instead of 410 (Gone) with a migration pointer. Actions with invalid syntax show “ACTIVE” instead of “SYNTAX ERROR.” Email verification flags are passive instead of active. The system consistently chooses silence over communication.

3. Self-Hosted Is a Second-Class Citizen

The documentation reads like it was written for ZITADEL Cloud users. Self-hosting guides assume you will use their managed offering for email, branding, and social login configuration. The operational complexity of the two-container architecture, PAT exchange, volume permissions, and init-time-only environment variables suggests that self-hosting is tested as an afterthought.

4. License Risk: Apache 2.0 to AGPL 3.0

In May 2025, ZITADEL switched from Apache 2.0 to AGPL 3.0 (effective v3). The AGPL copyleft provisions mean that if ZITADEL is integrated into your application (not just used as an external service), you may need to open-source your entire application. For an ERP system, this introduces legal review requirements at every integration point.

5. Security Track Record Raises Questions

Recent CVEs in ZITADEL include:

  • Account takeover via federation – inactive IdPs could still link external identities
  • DOM-Based XSS in the v2 logout endpoint (v4.0.0 through v4.7.0)
  • Full-read SSRF via x-zitadel-forward-host header (v4.7.0 and below)

These are not edge cases. These are authorization bypass, account takeover, and data exfiltration vulnerabilities in the most security-critical component of any stack. For an identity provider, this frequency of critical CVEs is concerning.

Path Forward

What We Should Have Considered: The Alternatives

After documenting our experience, we evaluated every viable self-hosted identity solution against our requirements: multi-tenancy, OIDC/OAuth2, social login, email verification, MFA, custom branding, API-first management, and self-hosting.

Disqualified Immediately

Solution Reason
Ory Kratos (OSS) Open-source version is single-tenant only. Multi-tenancy requires enterprise license or instance-per-tenant deployment.
Authelia No multi-tenancy. No user management API. Reverse-proxy SSO only.
Casdoor Critical security history: SQL injection, CSRF, cross-org admin bypass, arbitrary file deletion. Disqualifying for an identity provider.
Custom Build Industry estimates: $250K-$500K initial + $100K+/year maintenance. 15-20% ongoing engineering time. Every OIDC edge case is your problem forever.

Tier 1: Keycloak – The Battle-Tested Standard

Why it leads: 13 years of production use. 32,000+ GitHub stars. 1,400+ contributors. CNCF incubating project. Red Hat enterprise backing. Used by governments, banks, and Fortune 500 companies.

Multi-tenancy: Two mature approaches – realm-per-tenant (complete isolation) and the new Organizations feature (Keycloak 25+) for single-realm multi-tenancy.

API stability: The Admin REST API has remained largely stable across major versions. Red Hat provides long-term support builds. Breaking changes happen at major version boundaries with documented migration guides.

Trade-offs:

  • Java/JVM runtime: minimum ~1.25 GB RAM per pod, GC tuning required
  • Memory leak reports in versions 24+
  • Admin UI feels dated compared to ZITADEL
  • Theme customization requires Freemarker knowledge
  • No event-sourced architecture (traditional CRUD)

Verdict: If we were starting over, Keycloak would be our first choice. The resource overhead is a real cost, but stability and community support are worth the RAM.

Tier 2: Logto – The Modern Challenger

Why it is interesting: Native organization support with per-tenant MFA, JIT provisioning, and tailored sign-in experiences. Clean admin console. 11,500+ GitHub stars and growing fast.

Trade-offs:

  • Node.js/TypeScript runtime (operational mismatch with Go backend)
  • Younger project (launched ~2022), less battle-tested
  • Self-hosted version less proven at scale than cloud offering
  • Fewer enterprise reference customers

Verdict: Worth serious evaluation if your team is comfortable operating a Node.js service. The multi-tenancy model is purpose-built for SaaS.

Tier 2.5: FusionAuth – The Pragmatic Middle Ground

Why it is worth evaluating: Explicit tenant-scoped model baked into the product from the start. Multi-tenancy is not bolted on – tenants, applications, and user pools are first-class API objects. Operational model is straightforward: single deployment, deterministic configuration, no multi-container choreography.

Trade-offs:

  • Community Edition is source-available but not open-source (commercial license)
  • Advanced features (SCIM, entity management, advanced threat detection) require paid tiers
  • Smaller community than Keycloak (closed-source core; ~2K stars across GitHub repos) with less operator knowledge in the ecosystem
  • Java runtime with similar memory characteristics to Keycloak

Verdict: If licensing economics work for your team, FusionAuth offers the most predictable self-hosted multi-tenant experience short of Keycloak. Evaluate the Community vs. Essentials vs. Enterprise tiers carefully – the line between them matters for long-term cost.

Tier 3: SuperTokens – The Auth Toolkit

Why it is worth considering: Excellent multi-tenancy with per-tenant login methods, separate user pools, and dynamic tenant creation. 14,500+ GitHub stars.

Trade-offs:

  • More of an auth toolkit than a full IAM platform
  • Less mature OIDC/OAuth2 server compared to Keycloak
  • SAML support via BoxyHQ bridge only (not native)
  • Java core + Node.js SDK layer

Verdict: Strong for authentication flows but may lack the comprehensive identity management features an ERP needs at scale.

Watching: Authentik

Multi-tenancy is currently in alpha (v2024.2). If it reaches GA in 2026, Authentik becomes a strong contender – the visual flow editor for auth journeys is compelling, and the community (20,000+ stars) is growing rapidly. However, no professional security audits have been published, which is concerning for an identity provider.

Comparison Matrix

Criteria ZITADEL Keycloak FusionAuth Logto SuperTokens
Multi-tenancy Native Native (2 approaches) Native (first-class) Native Native
OIDC Certified Yes Yes Yes No Partial
API Stability Low (3 generations in 6 years) High (stable REST API) High Medium (young) Medium-High
Self-hosting DX Poor Moderate Good Moderate Good
Community ~13K stars ~32K stars ~2K stars ~11.5K stars ~14.5K stars
License AGPL 3.0 Apache 2.0 Source-available (commercial) MPL 2.0 Apache 2.0
Language Go Java Java Node/TS Java+Node
Security Record Concerning Good (Red Hat) Good Good Good
Documentation Gaps for self-hosters Extensive (13 years) Good (tenant-focused) Good (modern) Good (recipe-based)
Resource Usage ~512 MB (x2 containers) ~1.25 GB+ ~1 GB+ ~512 MB ~512 MB
Maturity 6 years 13 years 8 years 5 years 5 years

Our Recommendation

If you are evaluating ZITADEL for self-hosting: proceed with extreme caution.

The technology is architecturally interesting (event sourcing, Go, native multi-tenancy), but the developer experience for self-hosters is not production-grade. You will spend significant engineering time working around deprecated APIs, undocumented behaviors, and silent failures. The AGPL license change and API instability introduce long-term risk that is difficult to quantify.

For most teams building multi-tenant applications, we recommend Keycloak. It is not as modern or elegant, but it works. The APIs are stable. The documentation is comprehensive. The community is massive. Red Hat provides enterprise support. When you hit a problem at 2 AM, the answer is on Stack Overflow.

If Keycloak’s Java overhead is unacceptable, evaluate Logto or SuperTokens as modern alternatives with native multi-tenancy.

If you are already invested in ZITADEL:

  1. Pin to a specific version and do not upgrade without testing in staging
  2. Avoid all beta/alpha APIs – only use endpoints that are explicitly marked GA
  3. Document every workaround (we have 730 lines of operational runbook)
  4. Have a migration plan to Keycloak ready. Define hard exit criteria before you start: deterministic local bootstrap under 30 minutes, stable callback and login flows across restart and upgrade, scripted tenant provisioning and user lifecycle with no manual steps, and no undocumented workarounds for critical paths. If your current setup cannot meet these bars, you already have your answer.

The Deeper Lesson

Choosing an identity provider is not like choosing a database or a message queue. Authentication is the front door to your application. When it breaks, nothing else matters – users cannot log in, onboarding stops, and your support queue explodes.

We chose ZITADEL because it was shiny. We should have chosen boring.

In infrastructure, boring is a feature.

What Comes Next: Building What Should Already Exist

We are not just migrating away from ZITADEL. We are building the replacement.

After 23+ hours of debugging silent 404s, after 730 lines of operational runbook to paper over undocumented behaviors, after watching a PAT handshake between two containers that used to be one – we stopped asking “which identity provider should we use?” and started asking “why does every option require this much suffering?”

The answer is that the identity space has optimized for feature lists, not developer experience. Every provider markets multi-tenancy, OIDC, social login, MFA. None of them market “it just works when you self-host.” None of them market “you will not lose a weekend to a silent deprecation.”

So we are building an open-source identity provider modeled on what Resend did for email: customizable for teams that need it, with defaults so good that most teams never need to customize at all.

The design pillars:

  1. Sensible defaults over configuration gymnastics. A fresh install sends verification emails, redirects users after password setup, and exposes a working login page – without reading 40 pages of documentation or setting 12 environment variables. You should be able to docker compose up and have a working identity provider in under five minutes.

  2. Loud errors over silent failures. Deprecated endpoints return 410 Gone with a Sunset header and a migration URL – not a 404 that looks like your resource does not exist. Actions with invalid syntax fail at creation time with line numbers – not at runtime with “Errors.Internal.” If something breaks, the system tells you exactly what broke and exactly how to fix it.

  3. Self-hosters are first-class citizens. Documentation assumes you are running your own infrastructure. Defaults are tuned for Docker Compose and Kubernetes, not for a managed cloud dashboard. Upgrade guides test the self-hosted path first, not last. If a feature works in the cloud offering but not in self-hosted, it is not shipped.

  4. API stability is a feature, not a constraint. Semantic versioning with published deprecation timelines. Deprecation headers on every sunset endpoint, twelve months before removal. A single canonical API surface – not three overlapping generations where you have to guess which one still works.

We lost three full working days to identity plumbing instead of building our ERP. That is the tax we refuse to pass on to the next team.

More details soon. If this resonates, watch this space.

This article documents our experience with ZITADEL v4.10.1 as of February 2026. Issues are reported with specific API calls, error messages, and root causes. We respect the ZITADEL team’s engineering effort and hope this feedback contributes to improving the self-hosted experience.

All referenced issues, RCAs, and code are from the AmanERP authorization POC repository.

Tags: #ZITADEL #Authentication #SelfHosted #Keycloak #Identity #DevExperience #OpenSource #MultiTenancy

Appendix: Our Issue Log (Summary)

# Issue Severity Hours Lost Root Cause
1 Login UI vanished after v4 upgrade Critical 6+ Architectural rewrite, separate container required
2 Email verification links had doubled paths High 3+ BASEURI auto-append behavior, undocumented
3 SMTP endpoint returns 404 (silently deprecated) High 4+ API deprecated without warning or migration header
4 Verification emails never sent High 4+ v2 API requires explicit sendCode, undocumented
5 Custom actions fail silently at runtime High 3+ Actions v1→v2 model change (inline JS to webhooks), no syntax validation at creation
6 Social login callback URL changed Medium 2+ Login V2 changed callback format, breaking OAuth
7 Post-verification dead-end page Medium 1+ defaultRedirectUri not set by default
Total 23+ hours On debugging identity provider issues alone

23+ hours of engineering time lost to issues that should have been caught by deprecation warnings, documentation, or sensible defaults. That is three full working days spent not building our ERP.

Sources and References

Leave a Reply