DevOps Cloud

The Hidden Trap of Migrating to Private Network Load Balancers on AWS

A deep dive into building robust, scalable CI/CD pipelines using GitHub Actions for modern cloud-native applications.

DA

Damilare Adekunle

· 3 min read

0 Comments

Short link: https://ddadekunle.com/p/1

The Hidden Trap of Migrating to Private Network Load Balancers on AWS

Moving from Development to Production often means one thing: taking your "everything is public" architecture and locking it down.

Recently, I migrated my Recommendation API from a public-facing Application Load Balancer (ALB) to a private architecture using AWS API Gateway and an Internal Network Load Balancer (NLB).

On paper, the plan was solid. In Terraform, it looked clean. In reality, it took down the service. Here is the technical breakdown of why it failed, and how I fixed it.

The Architecture Shift

Before: Public Internet -> Internet Gateway -> Public ALB -> ECS Fargate (Public Subnet)

After: Public Internet -> API Gateway -> VPC Link -> Internal NLB -> ECS Fargate (Private Subnet)

I switched to a REST API Gateway to leverage API Keys and Usage Plans for rate limiting. Because API Gateway communicates with private resources via a "VPC Link," I had to swap my Layer 7 ALB for a Layer 4 NLB.

The "500 Error" Mystery

After applying the Terraform plan, the API Gateway returned a generic 500 Internal Server Error.
The API Gateway logs said the integration was failing.
The Target Group showed all Fargate tasks as Unhealthy.

Trap #1: The Source IP Confusion

I configured my ECS Security Group to allow traffic from the VPC CIDR (where the API Gateway VPC Link lives). I assumed this would cover everything.

The Gotcha: Network Load Balancers (NLBs) preserve the client IP address.
When a request comes from the API Gateway, the ECS task sees the Source IP as the VPC Link ENI. My Security Group allowed this.

However, NLB Health Checks do not come from the client. They originate from the Load Balancer nodes themselves. Because I hadn't explicitly allowed traffic from the Load Balancer's Security Group, the health checks were being blocked by the firewall.

The Fix:
I modified the ECS Security Group to accept ingress from two sources:

The VPC CIDR (for the actual API traffic).

The Load Balancer Security Group (specifically for health checks).

Trap #2: Over-Securing the Internal

In my zeal for security, I tried to lock down the Internal NLB's security group to specific IP ranges. This caused intermittent connectivity issues because routing through VPC Links can be opaque.

The Realization:
An AWS Load Balancer with internal = true does not have a public IP address. It is physically impossible to route to it from the internet.

By setting the Internal NLB's Security Group ingress to 0.0.0.0/0, I wasn't opening it to the world. I was simply saying, "If you are already inside this private network and can route to me, come on in."

Conclusion

When working with AWS networking, "Least Privilege" is the goal, but "Functional Connectivity" is the requirement.

Remember that NLBs preserve source IPs (unlike ALBs).

Ensure your Security Groups account for both User Traffic AND Infrastructure Traffic (Health Checks).

Don't over-complicate rules for resources that are already isolated by the network topology.

Happy coding!

Share

Twitter LinkedIn

Comments (0)

Comments are protected by anti-spam filters and rate limiting.

No comments yet. Start the discussion.