How I Debugged API Gateway 403s in Production

Few AWS errors are as misleading as an API Gateway 403. The response body is usually a generic {"message": "Forbidden"}. There's no stack trace, no path indicator, nothing that tells you which of the four possible sources is responsible. You're left guessing.

I ran into two distinct 403 issues while building a multi-account CI/CD platform — one from a broken Terraform deployment trigger, one from a FastAPI routing problem specific to API Gateway's stage prefix behaviour. Both looked identical from the outside. Here's how I diagnosed them, and the broader framework I now use for any API Gateway 403.

The four sources of an API Gateway 403

Before you can fix a 403, you need to know which layer it's coming from.

1. Resource Policy. If your API has a resource policy attached, requests that don't match an Allow statement are denied before they reach any route. This is the first thing to check if you've recently added a resource policy or changed account-level settings.

2. API Key / Usage Plan. If a route requires an API key and the request doesn't include a valid one in the x-api-key header, API Gateway returns 403 before forwarding anything to the integration. This is distinct from a 401 — API Gateway uses 403 for missing or invalid keys by default.

3. Integration misconfiguration. If the method and route exist but the integration is broken — wrong ARN, wrong path, wrong HTTP method — API Gateway may return 403 from the integration side. This is less common but possible, especially with proxy integrations.

4. Stale deployment snapshot. This one is the most insidious. API Gateway deployments are snapshots. If you add or change routes in Terraform but the deployment resource doesn't redeploy, the live API still reflects the old snapshot. Requests to the new routes hit a path that doesn't exist in the deployed version — and that returns 403.

The CloudWatch execution logs you're probably not enabling

The fastest way to distinguish between these sources is CloudWatch execution logs. Not access logs — execution logs. They show the full request processing pipeline: which route was matched, which method was invoked, whether the API key check passed, and what happened at the integration layer.

By default, execution logging is disabled. You need to enable it per stage:

resource "aws_api_gateway_stage" "main" {
  # ...

  access_log_settings {
    destination_arn = aws_cloudwatch_log_group.api_access.arn
  }
}

resource "aws_api_gateway_method_settings" "all" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  stage_name  = aws_api_gateway_stage.main.stage_name
  method_path = "*/*"

  settings {
    logging_level      = "INFO"
    data_trace_enabled = true
  }
}

data_trace_enabled = true gives you the full request/response payloads in the logs. It's verbose — don't leave it on indefinitely — but during debugging it's the difference between guessing and knowing.

Once enabled, execution logs appear in CloudWatch under /aws/api-gateway/[api-name]/[stage]. Each log entry shows you exactly where the request was rejected.

The Terraform deployment trigger problem

The first 403 I hit was caused by Terraform not redeploying the API when I added new routes.

The issue is a fundamental behaviour of the aws_api_gateway_deployment resource: it only redeploys when its own configuration changes. Changes to dependent resources — new route resources, updated integrations, new method responses — don't automatically trigger a redeployment. The deployed API snapshot stays stale.

This means a newly added route will exist in your Terraform state and in the API configuration, but the live API won't know about it. Requests to that route return 403.

The fix is a triggers block that forces redeployment whenever any route or integration changes:

resource "aws_api_gateway_deployment" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id

  triggers = {
    redeployment = sha1(jsonencode([
      aws_api_gateway_resource.root.id,
      aws_api_gateway_resource.health.id,
      aws_api_gateway_method.health_get.id,
      aws_api_gateway_integration.health.id,
      # add every route and integration resource ID
    ]))
  }

  lifecycle {
    create_before_destroy = true
  }
}

Any change to any of those resource IDs recalculates the hash, which changes the triggers value, which forces a new deployment. Without this, you're always one Terraform apply behind.

The FastAPI root_path problem

The second 403 was specific to FastAPI running behind API Gateway.

API Gateway stages work by prepending a path prefix to every URL. A request to your API at /dev/items arrives at your backend as /items — the stage prefix is stripped. FastAPI's internal routing handles /items fine.

The problem is Swagger UI. When you access /dev/docs, Swagger UI makes a follow-up request to fetch the OpenAPI spec. By default, FastAPI generates that URL as /openapi.json — an absolute path, resolved from the root. Without the stage prefix, the full URL becomes https://[api-id].execute-api.[region].amazonaws.com/openapi.json, which doesn't exist. API Gateway returns 403.

The fix is two-part:

First, set root_path on the FastAPI app. This tells FastAPI where it's mounted, so it generates correct absolute URLs:

import os
STAGE = os.getenv("API_STAGE", "").strip()
app = FastAPI(root_path=f"/{STAGE}" if STAGE else "")

Second, rewrite the /docs endpoint to use a relative URL for the OpenAPI spec:

from fastapi.openapi.docs import get_swagger_ui_html

@app.get("/docs", include_in_schema=False)
async def custom_swagger():
    return get_swagger_ui_html(
        openapi_url="openapi.json",  # no leading slash — relative URL
        title="API Docs"
    )

Without the leading slash, the browser resolves openapi.json relative to the current path (/dev/docs/openapi.json), which API Gateway correctly routes to your backend. With the leading slash, it resolves to /openapi.json — absolute, no stage prefix, 403.

The diagnostic order

When you hit a 403 on API Gateway and don't know why:

Enable execution logging if it isn't already on. This is non-negotiable.
Check if a resource policy exists on the API. If there is one, check whether your request matches an Allow statement.
Confirm the API key is being passed correctly and is associated with a usage plan that includes this stage.
Check your deployment — run terraform apply and confirm the deployment resource actually redeployed. If you're not using a triggers block, it may not have.
Check the route path — if you're behind a stage prefix, verify that your backend is handling the prefix correctly.

The 403 almost always comes from one of these. Execution logs will tell you which.

How I Debugged API Gateway 403s in Production

The four sources of an API Gateway 403

The CloudWatch execution logs you're probably not enabling

The Terraform deployment trigger problem

The FastAPI root_path problem

The diagnostic order

Comments (0)

More from this topic

Why PGOUTPUT Beats PGLOGICAL For Supabase Migrations

The Zero-Edit Merge Strategy Explained

DMS Replication Slots: What Nobody Tells You