Introduction
Keyshade is an open-source real-time secret and configuration management platform that helps developers securely store, sync, and rotate sensitive data like API keys, passwords, and environment variables across multiple environments. It uses end-to-end encryption and live updates to eliminate insecure .env file sharing, manual dashboard updates, and secret sprawl. Designed for teams and solo developers alike, Keyshade improves collaboration, auditability, and security by providing version history, access controls, and seamless integrations with modern deployment platforms.
Try it out on https://keyshade.io/
The Problem: Silent Failures in Cloud Integrations (#1254)
The problem was that the integrations with Vercel or AWS Lambda would only respond to updates in secrets and variables. It would completely ignore situations where the user had deleted the environment or renamed the environment. This meant that when a user deleted an environment in Keyshade, the secrets and variables would still be available in Vercel or AWS Lambda but would be orphaned. This presented a problem from a security point of view. Also, when the environment was renamed, the integrations would be out of sync.
Why This Bug Mattered to Production Teams
This problem was significant because of several factors:
- Security Risk: When a user deleted an environment using Keyshade (like an environment in staging mode), the secrets became active in Vercel and AWS Lambda. This meant the secrets were exposed on another platform. That is, credentials became potential attack vectors.
- Data Integrity: The disconnect between Keyshade’s environment state and the external platforms led to a situation where users couldn’t be assured of data integrity. It defeats the purpose of having a reliable tool like Keyshade.
- User Experience: It will be the responsibility of the consumers of Keyshade to manually log in to the Vercel or AWS Lambda interface to clean up after environment changes, thereby defeating the purpose of the integration in the first place. Such an interface might discourage the use of Keyshade.
- Production Readiness: The integrations must be prepared to handle the full lifecycle of environments, not only the secrets within static environments, for Keyshade to be considered viable in production, especially by teams with high rates of environment changes (e.g., CI/CD workflows creating ephemeral environments). It was marked as “priority: high” and “difficulty: 4.” This was my hardest PR that took around two months.
Under the Hood: Key Code Changes
keyshade/apps/api/src/integrations/plugins/vercel.integration.ts
Purposes: Integrates the Vercel platform – responsible for syncing Keyshade secrets/variables to Vercel projects as environment variables.
Key sections to edit:
getPermittedEvents(). Added ENVIRONMENT_UPDATED and ENVIRONMENT_DELETED to the set of events the integration subscribes to.
added case handlers to switch of emitEvent() for routing environment lifecycle events.
New functions added: handleEnvironmentDeleted(): This cleans up project-level environment variables scoped to the deleted environment and breaks the integration relationship handleEnvironmentUpdated(): Renames custom Vercel environments if available (pro plan users) & update integration metadata
apps/api/src/integration/plugins/aws-lambda.integration.ts
Purpose: Implements the AWS Lambda integration, syncing Keyshade secrets to Lambda function environment variables
Key sections modified:
getPermittedEvents(): Added the two environment events
emitEvent() switch statement: Added routing for environment events
New functions added:
handleEnvironmentDeleted(): Removes environment-scoped keys from Lambda function configuration and disconnects the environment relationship
handleEnvironmentUpdated(): No-op handler (Lambda has no environment rename concept) that logs an audit event
apps/api/src/common/util.ts
Purpose: Added retryWithBackoff() helper function to handle transient failures when calling external APIs (Vercel/AWS)
Why important: Cloud APIs can fail temporarily; this ensures cleanup operations are resilient and retry with exponential backoff
apps/api/src/integration/reconciler.ts
Purpose: Lightweight reconciliation system that processes failed cleanup operations stored in integration.metadata.pendingCleanup
Why important: When retries fail after multiple attempts, the reconciler can later replay these operations, ensuring eventual consistency even if external APIs are temporarily unavailable
apps/api/src/event/event.types.ts
Purpose: Defines TypeScript types for event metadata
Relevant types:
EnvironmentUpdatedEventMetadata: Contains name and description fields for environment updates
EnvironmentDeletedEventMetadata: Contains the environment name for deletion events
Test files:
apps/api/src/integration/plugins/vercel.integration.spec.ts: tests covering environment lifecycle handlers
apps/api/src/integration/plugins/aws-lambda.integration.spec.ts: tests covering Lambda-specific handlers
apps/api/src/integration/reconciler.spec.ts: testing the reconciler’s pending cleanup processing
Supporting materials/artifacts
Before the fix:
When a user deleted an environment in Keyshade, the integration would:
Not receive or process the ENVIRONMENT_DELETED event
Leave all environment-scoped secrets active on Vercel/AWS Lambda
Not update the integration metadata to reflect the environment was gone
Create orphaned credentials that users would have to manually clean up
Error behavior:
While there wasn’t a specific error message, the silent failure was the problem. Users would delete an environment expecting it to be cleaned up everywhere, but the secrets would persist on external platforms with no warning or indication that cleanup failed.
After the fix:
The integration now:
Subscribes to ENVIRONMENT_UPDATED and ENVIRONMENT_DELETED events
Automatically removes environment variables from Vercel projects when environments are deleted
Removes environment-scoped keys from AWS Lambda function configurations
Updates integration metadata to stay in sync with environment name changes
Uses retry logic with exponential backoff for resilience against transient API failures
Persists failed operations in pendingCleanup for later reconciliation
Logs audit events for compliance and debugging
Key Implementation Detail:
For Vercel’s ENVIRONMENT_UPDATED handler, there’s a special consideration: Vercel only allows renaming custom environments on Pro plans, not the standard “Preview”, “Production”, or “Development” environments. The code includes logic to attempt the rename and gracefully handle cases where it’s not supported.
Challenges
While preparing my pull request, I ran into several real-world challenges that highlight what contributing to an active open-source project actually looks like behind the scenes. The first issue was merge conflicts between my branch (attempt-1254) and the target develop branch. In a fast-moving repository with many contributors, this is almost inevitable. I resolved the conflicts by pulling the latest upstream changes and manually reviewing each conflicting section to understand both sides. My goal was to preserve other contributors’ work while ensuring my environment lifecycle handler implementation remained intact. After merging, I tested everything locally to confirm nothing regressed, then committed with a clear explanation of the resolution. It was a reminder that merge conflicts aren’t just mechanical fixes — they’re moments where you have to understand the intent of multiple developers and reconcile them thoughtfully.
The next challenge surfaced during code review. A maintainer questioned my use of any type casts in TypeScript, particularly when accessing integration.id and working with the pendingCleanup metadata array. They pointed out that this could introduce long-term technical debt and suggested using @ts-expect-error instead. That feedback was valuable. While the any casts were originally a temporary measure to unblock a critical bug fix, the reviewer highlighted how @ts-expect-error is self-validating: TypeScript warns you if the suppressed error disappears later, which prevents stale workarounds from lingering in the codebase. I acknowledged the concern, added TODO comments recommending a future typed refactor, and committed to either replacing any with @ts-expect-error where appropriate or creating a follow-up issue to properly type the metadata structures. This exchange was a great example of how open-source review culture improves not just correctness, but long-term maintainability.
The biggest blocker came from the CI/CD pipeline. My PR was failing two checks: a Snyk security scan and an API validation build. The frustrating part was that I couldn’t reproduce the failures locally. All my local tests passed — dependencies installed cleanly, the build succeeded, and every unit test ran successfully. But Snyk repeatedly failed in CI with errors related to workspace handling and package manager compatibility. The CLI didn’t fully support the project’s pnpm workspace setup in the same way the CI environment did, and I couldn’t access the GitHub Actions logs directly from my development environment. I tried multiple workarounds, including running Snyk through pnpm and downloading the standalone CLI, but nothing mirrored the CI behavior. Even rerunning the workflows with empty commits didn’t resolve the issue.
At that point, I documented everything I had tried in a detailed PR comment and asked for help. A maintainer with admin access reviewed the CI logs and identified the root cause: a vulnerability in a transitive dependency that required updating the root package.json and regenerating the lockfile. They also explained that the project’s Snyk integration uses a specialized workspace configuration that isn’t trivial to replicate locally. This led to a broader discussion about improving CI transparency for external contributors, including the possibility of automatically surfacing relevant log snippets in PR comments. The experience reinforced an important lesson: as an external contributor, you won’t always have full visibility into infrastructure, and clear documentation of your debugging efforts is essential for maintainers to step in effectively.
Solution (PR #1255)
After working through these issues collaboratively, the pull request was successfully merged. The final solution included environment lifecycle event handlers for both Vercel and AWS Lambda, retry logic with exponential backoff, a reconciliation system for eventual consistency, and comprehensive test coverage. All CI checks passed, and review feedback was fully addressed. More importantly, the contribution closed a high-priority bug that had been blocking production adoption of cloud integrations. It was a strong reminder that persistence, careful communication, and respect for the review process are just as important as writing the code itself. Open-source success isn’t just technical, it’s collaborative.
What Did I Learn?
- How to read official Typescript documentation to break down confusing code
- How to ask beneficial questions to maintainers
- How to navigate CI/CD failures as an external contributor
- Understanding Event-Driven API Architecture
- Snyk improves security in development
