When I joined the data engineering team, twelve ETL pipelines each had their own .github/workflows/deploy.yml. Most of them looked similar. None of them were identical.
Some used an older version of actions/checkout. A few had a SLACK_WEBHOOK variable named differently. One was missing the Slack notification step entirely — someone had removed it during debugging and never put it back. One had a health check after deployment; most didn't.
Updating any single workflow required a PR to that pipeline's repository. Updating all of them required twelve PRs. In practice, that meant they never all got updated. The workflows drifted, and the drift accumulated silently.
This is the real cost of per-application CI/CD logic: it doesn't stay consistent. The same problem that drives you toward infrastructure modules drives you toward centralised CI/CD — duplication is a maintenance liability, and maintenance liabilities grow until they become incidents.
What centralising actually means
GitHub Actions has a feature built for this: reusable workflows. A workflow in one repository can be called from a workflow in any other repository in the same organisation.
The pattern looks like this. In the infrastructure repository, you define the full deployment workflow:
# org/terraform-iac/.github/workflows/universal-deploy.yml
on:
workflow_call:
inputs:
service_name:
required: true
type: string
secrets:
DEPLOY_ROLE_ARN:
required: true
SLACK_WEBHOOK_URL:
required: true
jobs:
deploy:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials via OIDC
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.DEPLOY_ROLE_ARN }}
aws-region: us-east-1
- name: Login to ECR
# ...
- name: Build and push image
# ...
- name: Update ECS task definition and deploy
# ...
- name: Notify Slack
# ...
In each pipeline repository, the entire deploy.yml is a single call:
# org/dp-event-ingestion/.github/workflows/deploy.yml
on:
push:
branches: [main]
jobs:
deploy:
uses: org/terraform-iac/.github/workflows/universal-deploy.yml@main
with:
service_name: "dp-event-ingestion"
secrets:
DEPLOY_ROLE_ARN: ${{ secrets.DEPLOY_ROLE_ARN }}
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
That's the entire pipeline definition for an application. Everything else — authentication, build steps, deployment, notifications — lives in one place and is maintained from one place.
The upgrade path
The practical consequence is what matters most.
When you need to update the pipeline — better error handling, a newer version of an action, a rollback step, a change to how images are tagged — you make the change once in the infrastructure repository. Every pipeline inherits it automatically on its next deploy, without a PR to each application repo.
This changes the economics of maintaining CI/CD. Instead of "it would cost twelve PRs to add a rollback step, so we'll do it later," it costs one PR. So it gets done. The pipeline improves continuously instead of staying frozen at whatever state it was in when it was copy-pasted.
Where OIDC fits in
The shared OIDC provider is the other half of this. AWS OIDC allows GitHub Actions to assume an IAM role via web identity federation — no long-lived access keys stored in repository secrets.
On the infrastructure repository, the OIDC provider is configured once. The trust policy on the deployer role grants access to any repository in the organisation:
condition {
test = "StringLike"
variable = "token.actions.githubusercontent.com:sub"
values = ["repo:org/*:ref:refs/heads/main"]
}
Every new pipeline repository gets deployment access without touching IAM again. Adding a new pipeline means writing a main.tf and a two-line deploy.yml — the authentication infrastructure is already there.
The infrastructure repo as a dependency
There's a legitimate concern with this pattern: if the infrastructure repository's workflow has a bug, every pipeline is broken until it's fixed. You've introduced a single point of failure.
This is true. It's also the right tradeoff.
The alternative — distributed CI/CD across twelve repositories — has twelve points of failure that are all individually less visible. A broken workflow in one pipeline repository might go unnoticed for weeks. A broken workflow in the infrastructure repository is immediately visible to every team that deploys.
The dependency also forces discipline. The central workflow has to be stable and well-tested, because its failure radius is large. That pressure is appropriate. The workflow that every team depends on should receive more care than the workflow only one team ever looks at.
What belongs in the central workflow vs the application
The central workflow owns anything that's the same across all pipelines: OIDC authentication, ECR login, the build and push steps, ECS deployment, Slack notification. These are pure infrastructure concerns that no application developer should need to think about.
The application repo owns anything that's specific to it: the push trigger (which branch or tag triggers a deploy), the service_name input, and the repository-level secrets for its specific environment. That's all.
Application build steps that go beyond a standard docker build — running tests, generating assets, compiling code — can be added as a separate job in the application's workflow file before the call to the central deploy workflow. The jobs compose cleanly. The deployment logic stays centralised.
The line is: if a change would need to happen in every pipeline repository to take effect, it belongs in the central workflow. If it's specific to one pipeline's behaviour, it belongs in that pipeline's repository.
Comments (0)
Comments are protected by anti-spam filters and rate limiting.
No comments yet. Start the discussion.