DevOps Cloud AWS Terraform

The Zero-Edit Merge Strategy Explained

If deploying the same Docker image to dev and production requires code changes between environments, your configuration is in the wrong place. Here's the pattern I use to build once and promote the exact artifact — same SHA, same bytes — across isolated AWS accounts with no edits.

DA

Damilare Adekunle

· 5 min read

0 Comments

Short link: https://ddadekunle.com/p/7

The Zero-Edit Merge Strategy Explained

The goal of a proper CI/CD pipeline is simple to state: what runs in production is exactly what was tested in dev. Not a rebuilt version. Not a "mostly the same" image with different env vars baked in. The exact same artifact.

Most teams understand this in principle and violate it in practice. A developer adds an environment check to the application code. A Dockerfile ARG gets used to switch config. Someone rebuilds the image for production "just to be safe." Each of these is a small break in the chain — and each one means you're shipping something that wasn't fully validated.

The Zero-Edit Merge Strategy is the constraint I apply to prevent this: the same Docker image, built once, must be promotable to any environment without modifying a single line of application code.

Why environment-specific code exists in the first place

Environment-specific code almost always starts with configuration that should be external. The most common form is a hardcoded check like:

if os.environ.get("ENV") == "production":
    # production-specific behaviour

Or worse, values baked directly into the application:

STAGE = "dev"
app = FastAPI(root_path=f"/{STAGE}")

When the application has environment knowledge embedded in it, you can't run the same image in a different environment — the environment is part of the image. To deploy to production, you either change the code and rebuild, or you accept that the dev image will behave incorrectly.

Neither is acceptable.

The fix: runtime injection

The solution is to move every piece of environment-specific configuration out of the application and into the container's runtime environment. The application reads from environment variables at startup. The infrastructure sets those variables per environment. The image itself knows nothing.

For the FastAPI service I was deploying across two isolated AWS accounts, the change was minimal:

# Before: hardcoded
STAGE = "dev"

# After: injected at runtime
STAGE = os.getenv("API_STAGE", "").strip()
app = FastAPI(root_path=f"/{STAGE}" if STAGE else "")

Now the ECS task definition in dev sets API_STAGE=dev. The task definition in production sets API_STAGE=prod. The image sets neither — it reads whatever the environment provides. Same bytes, different behaviour, exactly as intended.

The Swagger UI edge case

There's a subtlety with FastAPI behind API Gateway that's worth naming explicitly, because it's a common source of broken docs pages in production.

API Gateway stages work by stripping the stage prefix before forwarding requests. A request to /prod/items arrives at your container as /items. FastAPI handles this correctly — but Swagger UI doesn't, unless you tell FastAPI where it's mounted.

The root_path parameter is that instruction. Without it, FastAPI generates absolute URLs for the OpenAPI spec (/openapi.json with a leading slash). The browser resolves that from the domain root, with no stage prefix. API Gateway returns 403.

With root_path set to the stage name, FastAPI knows it's mounted at /prod and generates the correct spec URL. The docs work in both environments without any environment-specific code — just the value injected at runtime.

The /docs endpoint needs one more adjustment to complete this:

@app.get("/docs", include_in_schema=False)
async def custom_swagger():
    return get_swagger_ui_html(
        openapi_url="openapi.json",  # relative, not absolute
        title="API Docs"
    )

openapi.json without a leading slash resolves relative to the current path. Whatever stage prefix is in the URL stays in the URL. This works correctly under any stage prefix with no environment-specific code.

The promotion workflow

Once the application is clean, the CI/CD pipeline can enforce the "build once" constraint structurally:

Push to dev branch
  ? Build image
  ? Tag with Git SHA
  ? Push to dev ECR
  ? Deploy to dev ECS
  ? Smoke test

Push a release tag
  ? Pull the image already in dev ECR (same SHA)
  ? Push to prod ECR (no rebuild)
  ? Deploy to prod ECS
  ? Smoke test

The production deploy doesn't call docker build. It pulls the image that was already validated in dev, using the exact SHA from that build. The trust chain is complete: what you tested is what you shipped.

In GitHub Actions, this looks like:

# In the prod workflow, after confirming the SHA exists in dev ECR:
- name: Pull and repush validated image
  run: |
    docker pull $DEV_ECR/$IMAGE_NAME:${{ github.ref_name }}
    docker tag $DEV_ECR/$IMAGE_NAME:${{ github.ref_name }} $PROD_ECR/$IMAGE_NAME:${{ github.ref_name }}
    docker push $PROD_ECR/$IMAGE_NAME:${{ github.ref_name }}

The production ECR now holds the exact image from dev. The deployment references it by SHA, not by latest.

What this enables

The zero-edit constraint isn't just about cleanliness. It unlocks a few things that matter operationally:

Confident rollbacks. Every image is tagged with the Git SHA that produced it. Rolling back is pointing the task definition at a previous SHA. You're not rolling back to a rebuild — you're restoring an artifact that was already in your registry.

No latest in production. latest is a moving target. A deploy that references latest is unpredictable — the same Terraform config might deploy different code depending on when it runs. SHA tags are immutable.

Clear audit trail. Every running container can be traced back to a specific commit. If something breaks in production, you know exactly what code is running and what changed.

Simpler debugging. When dev and prod run the same image, environment-specific behaviour differences are always configuration differences — not code differences. The search space shrinks dramatically.

The constraint is worth the upfront effort to get configuration out of the application. Once the image is genuinely environment-agnostic, the entire delivery pipeline becomes more trustworthy.

Share

Twitter LinkedIn

Comments (0)

Comments are protected by anti-spam filters and rate limiting.

No comments yet. Start the discussion.