Building a Terraform Module That Teams Actually Reuse

There's a version of Terraform modules that teams write and a version that teams actually use. They're often different things.

I've seen modules that require you to pass in a VPC ID, subnet IDs, security group IDs, IAM role ARNs, and a dozen optional variables with unclear defaults before you can provision anything useful. I've also seen modules so opinionated they assume your S3 bucket is named a specific thing and break if you deviate.

Neither gets reused. The overly generic module is intimidating. The overly opinionated one breaks on contact with reality.

When I built the standard-etl module to standardise infrastructure across 12 ETL pipelines, I had a clear goal: a new engineer should be able to add a pipeline to the platform in under two hours, with a main.tf they could write in ten minutes. Here's what made that work.

Start with the real callsite, not an imagined one

The biggest mistake in module design is designing the interface before you've used it. You end up optimising for theoretical flexibility instead of actual usage.

I wrote the first pipeline's infrastructure directly — flat Terraform, no module — and only extracted the module once I had something real to generalise. The callsite looked like this before the module existed:

ECR repository
ECS task definition (with hardcoded image, CPU, memory, env vars, secrets)
ECS service pointing at the cluster
Task execution role with permissions to fetch from SSM and pull from ECR
Task role with permissions to write to S3
CloudWatch log group

Everything above is repeated for every pipeline, with only the pipeline name, environment, and variable values changing. That's the module boundary: the repeated structure is the module, the varying values are the inputs.

The resulting callsite:

module "pipeline" {
  source = "../../../modules/standard_etl"

  app_name     = "dp-event-ingestion"
  env          = "dev"
  cluster_name = local.cluster_name
  subnet_ids   = local.subnet_ids

  container_env = [
    { name = "ENVIRONMENT",    value = "dev" },
    { name = "S3_BUCKET_NAME", value = local.data_lake_bucket },
    { name = "LOOP_INTERVAL",  value = "3600" }
  ]

  container_secrets = [
    { name = "API_KEY", valueFrom = local.api_key_ssm_arn }
  ]
}

That's it. No VPC ID passed in directly — the module gets it from the cluster. No IAM role ARNs — the module creates them. No security group — the module inherits the shared one from the cluster data source.

Hide what's common, expose what varies

The rule I follow: if every callsite would pass the same value, don't make it an input — make it a default or derive it inside the module. If every callsite would pass a different value, it's a required input.

Common across all pipelines: CPU allocation (256), memory (512), log retention (30 days), the IAM policy for S3 writes, the ECR lifecycle policy. None of these are inputs. They're inside the module, and the calling code never thinks about them.

Varying across pipelines: the pipeline name, the environment, the specific env vars, and the specific secrets. These are required inputs. The caller can't skip them.

Optional with meaningful defaults: the container image tag (defaults to latest during development, overridden at deploy time), the CPU and memory allocation if a specific pipeline needs more resources than the default. These exist as optional inputs, but most callsites never touch them.

The result is a minimal required surface that makes the common case trivially easy while leaving room for exceptions.

The secrets pattern: why it matters more than it looks

One of the most important things the module enforces is how secrets are handled.

Before the module, credentials were hardcoded in task definition JSON files. An API key would be a literal string in a file that lived in the repository or, worse, in Terraform state. This is a common pattern and a consistent security risk.

The module's container_secrets input takes SSM Parameter Store ARNs, not values:

container_secrets = [
  { name = "API_KEY", valueFrom = "arn:aws:ssm:us-east-1:123456789:parameter/dev/pipeline/api-key" }
]

The ARN goes into Terraform state. The secret value never does. ECS injects the secret at container startup via the task execution role, which has ssm:GetParameter permissions scoped to the pipeline's parameter path. The developer writes code that reads from an environment variable. They never handle the actual credential.

This pattern is enforced by the module interface. There's no way to hardcode a secret value — the input type only accepts valueFrom references. The module makes the secure pattern the only available pattern.

How the first migration shapes the module

The first pipeline I migrated onto the module took about four hours. A lot of that time was module refinement — discovering inputs I had wrong, restructuring the IAM policy, realising the log group naming convention needed to be consistent.

By the second migration, it was under two hours. By the fifth, it was closer to an hour, most of which was waiting for Terraform and ECS to stabilise.

This is normal and expected. A module designed in isolation will have rough edges that only appear when it meets real usage. Building the module from the first real callsite and then refining it on the second and third is the right order. Don't over-engineer the interface before you've used it.

What the module doesn't do

Equally important: what the module doesn't try to handle.

It doesn't create the VPC, the ECS cluster, or the S3 data lake. Those are shared resources that belong to a separate Terraform root, provisioned once per environment. The module is a consumer of those resources, not a creator of them.

It doesn't manage application code. The module provisions infrastructure — the container registry, the task definition, the service. The application image is built and pushed by the CI/CD pipeline. The module doesn't know or care what's in the image.

It doesn't handle deployment. terraform apply creates or updates the infrastructure. Deploying a new image version is an ECS service update triggered by CI/CD. Separating these concerns keeps the module focused and the deployment pipeline independent.

A module that tries to do all of these things becomes the thing nobody wants to touch. A module that does one thing well — in this case, "provision a standard pipeline container" — becomes the thing everyone uses without thinking about it.

Building a Terraform Module That Teams Actually Reuse

Start with the real callsite, not an imagined one

Hide what's common, expose what varies

The secrets pattern: why it matters more than it looks

How the first migration shapes the module

What the module doesn't do

Comments (0)

More from this topic

Why PGOUTPUT Beats PGLOGICAL For Supabase Migrations

The Zero-Edit Merge Strategy Explained

DMS Replication Slots: What Nobody Tells You