May 3, 2026

14 min read

ALB Under The Hood: What Really Lies Beneath the “Smart Receptionist”?

When you create an Application Load Balancer on AWS, you get exactly one thing: a long DNS name like my-alb-123456789.ap-southeast-1.elb.amazonaws.com. No static IP. No server you can SSH into. So what’s really behind that DNS name?

This article will reveal the hidden architecture behind ALB — from the army of load balancer nodes working silently, to the scaling mechanisms, and why ALB never has a static IP.

1. Load Balancer Nodes — “The Hidden Army”

When you create an ALB and assign it 3 subnets (each belonging to a different AZ), AWS doesn’t create a single “load balancer”. Instead, AWS spins up at least 1 load balancer node in each AZ you selected — this node lives in the subnet you assigned to that AZ. So with 3 AZs, you have a minimum of 3 nodes.

How do nodes scale?

Each node starts at size large. As traffic increases, AWS will scale vertically (upgrade the node) following this progression:

large → xlarge → 2xlarge → 4xlarge

When a node has reached 4xlarge and still doesn’t have enough capacity, AWS switches to horizontal scaling — adding new 4xlarge nodes to that AZ. In total, an ALB can scale up to a maximum of 100 nodes across all AZs.

Subnets must be “wide” enough

Each node consumes 1 IP address in the subnet. Therefore, AWS requires:

A minimum subnet of /27 CIDR (32 IPs, of which at least 8 IPs must be free)
If you have 50 nodes (10/AZ x 5 AZs), you need about 10 additional IPs/AZ for scaling room

What happens when the subnet runs out of IPs? The ALB enters an active_impaired state — it still works but can’t scale further, leading to 5xx errors and timeouts for users. This is a “silent” failure that many teams don’t notice until a traffic spike hits.

Tip: Always use subnet /27 or larger for ALB. If your system handles heavy traffic, consider /26 or /25 to ensure headroom for scaling.

2. Why Does ALB Only Have a DNS Name, No Static IP?

This is a classic question: NLB can attach Elastic IPs (static), so why can’t ALB?

The answer lies in the very scaling mechanism described above:

ALB continuously adds/removes nodes to handle traffic. Each node has its own IP.
When a new node is added → a new IP appears
When a node is removed → an old IP disappears

If AWS assigned a static IP to the ALB, which node would you attach it to? That node could be removed at any time. That’s why AWS chose the DNS abstraction approach: a DNS name pointing to the current list of node IPs.

How does DNS work?

When a client resolves the ALB’s DNS name:

Route 53 returns multiple A records — each record is the IP of an active node
Alias records have a fixed TTL of 60 seconds (cannot be changed)
When the ALB scales (adds/removes nodes), Route 53 automatically updates the IP list
Clients will receive new IPs after at most 60 seconds

So how does NLB get a Static IP?

NLB operates at Layer 4 — it only forwards TCP/UDP packets without inspecting content. NLB’s architecture is much simpler:

NLB creates 1 fixed ENI per AZ
Each ENI can be assigned 1 Elastic IP (static)
Internally, AWS uses Hyperplane technology to scale capacity without adding more ENIs

In other words: NLB “hides” the scaling complexity behind a fixed ENI. ALB can’t do this because it needs multiple real nodes to process Layer 7 traffic (inspecting HTTP headers, routing rules, SSL termination…).

	ALB	NLB
IP Model	Dynamic (changes when scaling)	Static (Elastic IP per AZ)
Addressing	DNS name only	DNS name + Elastic IP
Scaling Model	Add/remove nodes	Scale behind fixed ENI (Hyperplane)
Reason	Layer 7 processing is heavy, needs many nodes	Layer 4 forwarding is lightweight, 1 ENI suffices

Practical tip: If you need a static IP for ALB (e.g., a partner requires IP whitelisting), place an NLB in front of ALB. AWS supports registering ALB as a target group type for NLB, allowing you to combine NLB’s static IP with ALB’s smart routing.

3. ALB Scaling — “Accelerates Like a Rocket, Lands Like a Feather”

ALB scaling has an interesting characteristic: it’s asymmetric.

Scale up: Extremely aggressive. ALB can double capacity within 5 minutes
Scale down: Very conservative. AWS reduces gradually to avoid the situation where scaling down is immediately followed by another traffic spike

LCU Reservation — “Pre-booking” for Traffic Spikes

Previously, if you knew a large traffic event was coming (Black Friday, Flash Sale), you had to contact AWS Support to “pre-warm” ALB. Now, AWS provides the LCU Reservation feature — allowing you to pre-book capacity yourself.

How it works:

Determine expected peak load from CloudWatch metrics (PeakLCUs) or load testing
Set LCU Reservation via Console or ELBV2 API
ALB will provision minimum capacity at your reserved level
If traffic exceeds the reservation → ALB still auto-scales normally above that

Automation pattern for planned events:


EventBridge Scheduler (1 hour before event)
    → Lambda function
        → Read ALB info from DynamoDB
        → Call ELBV2 API to set LCU Reservation
        → After event, reset to normal level

Note: LCU Reservation is designed for short durations (hours to days), not for permanent capacity planning. Reserved LCUs use the Reserved rate (cheaper), and usage above the reservation uses the standard on-demand rate.

4. Cross-Zone Load Balancing

When ALB has nodes in multiple AZs, how is traffic distributed?

Default: Enabled at ALB level

With cross-zone load balancing enabled (default), each ALB node distributes traffic to all targets across every AZ, not just targets in its own AZ.

Example: You have 10 targets in AZ-a and 5 targets in AZ-b. With cross-zone enabled, each target receives ~6.67% traffic (1/15), regardless of which AZ it’s in.

When should you disable cross-zone?

You can disable cross-zone at the target group level in certain cases:

Want to keep traffic within the same AZ to reduce latency
Using zonal shift (AZ evacuation) for disaster recovery
Want tight control over traffic distribution

Cost

An important point: ALB does not charge for cross-zone data transfer between ALB nodes and targets. This is a major difference from NLB — where cross-zone data transfer is charged. This makes ALB more cost-effective for deployments spread across multiple AZs.

5. Connection Handling — ALB Is a Full Reverse Proxy

Many people think ALB simply “forwards” requests to the backend. In reality, ALB is a full reverse HTTP proxy — it terminates the TCP connection from the client and creates a new TCP connection to the backend.

HTTP Multiplexing & Connection Pooling

This is one of the biggest benefits that few people know about:

ALB receives requests from thousands of clients via thousands of TCP connections
But when sending to the backend, ALB uses a small number of persistent TCP connections
Multiple client requests are multiplexed over the same backend connection

This helps the backend server save significant CPU, memory, and bandwidth — instead of maintaining thousands of connections, the backend only needs to handle a few dozen connections from ALB.

HTTP/2: Supported but with limitations

ALB supports HTTP/2 from the client side (multiplexing, header compression, binary framing). However, when forwarding to the backend, ALB converts HTTP/2 streams into individual HTTP/1.1 requests. This means:

Client → ALB: HTTP/2 (multiple streams on 1 connection)
ALB → Backend: HTTP/1.1 (each stream becomes 1 separate request)

HTTP/2 stream multiplexing and prioritization are not preserved when passing through ALB.

Timeout Configuration — The Source of 502/504

ALB has 2 important timeouts:

Timeout	Default	Meaning
Connection Idle Timeout	60 seconds	How long a connection is kept idle before ALB closes it
HTTP Client Keep-Alive	3600 seconds (1 hour)	Maximum time a connection is kept alive

Critical best practice: Backend keep-alive timeout must be greater than ALB idle timeout.

Why? If the backend closes the connection before ALB does (e.g., backend timeout 30s, ALB timeout 60s), a race condition occurs:

ALB thinks the connection is still active → sends a request through
Backend has already closed the connection → returns RST
ALB returns 502 Bad Gateway to the client

Fix: Set backend keep-alive timeout = ALB idle timeout + 5-10 seconds buffer. Example: ALB idle timeout 60s → backend keep-alive 70s.

6. How Does a Request Reach the Right Node? — Load Distribution at the DNS Layer

Before discussing how ALB chooses a target backend, there’s a more fundamental question: if the ALB has 6 nodes (2/AZ x 3 AZs), when a client sends a request, what determines which of those 6 nodes receives the request? Is there some “central router” in front distributing traffic?

The answer will surprise you: There is no “central router”. Node selection happens entirely at the DNS layer, on the client side — before the first packet even leaves the client machine.

How it works

Client queries DNS: The browser/app asks “What’s the IP of my-alb-123.elb.amazonaws.com?”
Route 53 returns a list of IPs:
- Returns up to 8 IPs in a single response (each IP is 1 ALB node)
- The order is randomly shuffled in each response
- Fixed TTL of 60 seconds
- Only returns IPs of healthy nodes — impaired nodes are excluded from the response
Client picks 1 IP:
- Most OS/libraries pick the first IP in the response (browser, JVM, libcurl, Go net/http…)
- Some advanced clients try the next IP if the first fails
- Client caches the IP for the duration of the TTL (60s)
Client opens a TCP connection directly to that IP — that node handles the entire TCP/TLS connection for the lifetime of the connection.

So what is the “node distribution algorithm” really?

This is something many people don’t realize: ALB does not have a load balancing algorithm between nodes. Load distribution between nodes relies entirely on:

Random shuffle by Route 53 in each DNS response
Law of Large Numbers — when enough clients query, random distribution becomes statistically “even”

In other words: load distribution between nodes is only “even” when you have thousands of clients per second. With low traffic, some nodes may be “idle” while others are busy — this is normal.

In summary: ALB Has 2 Completely Separate “Decision” Layers

When a request passes through ALB, there are 2 independent decisions:

Layer 1 (choosing a node) is decided by AWS — you can’t configure anything beyond choosing the region and AZs. Layer 2 (choosing a target) is under your control via target group settings, which is what the next section covers in detail.

Practical tip: If you see uneven traffic distribution between ALB nodes (via CloudWatch per-node metrics), don’t immediately assume ALB is broken — check the client’s DNS caching behavior first.

7. Load Balancing Algorithms — How Does ALB Decide Which Target Gets the Request?

After a request has reached an ALB node (via the DNS mechanism in section 6), that node must choose which target backend in the target group to forward to. This is where ALB has 3 algorithms with distinct philosophies and use cases. You configure the algorithm at the target group level (not at the ALB level).

7.1. Round Robin — “Take Turns” (Default)

This is the default algorithm for ALB and the easiest to understand.

How it works:

Targets are arranged in a circle. The first request goes to target #1, the next to #2, then #3… After completing the circle, it loops back to #1.

Advantages:

Simple, predictable
Distributes evenly when all requests are “similar” (same complexity, same response time)
Doesn’t require ALB to track much state

Disadvantages:

Doesn’t care if a backend is busy or idle. If target A is processing 50 “heavy” requests and target B is idle, Round Robin still sends new requests to A in turn.
Not suitable when requests have uneven complexity (e.g., an API endpoint mix of fast and slow operations)

Use when: Requests are uniform, backends are uniform (same instance type, same workload). Most standard web app use cases.

7.2. Least Outstanding Requests (LOR) — “Whoever’s Free Does the Work”

This algorithm is smarter: ALB counts the number of in-flight requests (sent but no response received yet) to each target, and picks the target with the fewest in-flight requests.

How it works:

ALB maintains a counter for each target: incremented when a request is sent, decremented when a response is received.

Advantages:

Dynamically self-balancing based on actual target capacity. If a target is slow → its in-flight count decreases slowly → ALB automatically sends fewer requests to it.
Handles extremely well when requests have uneven complexity. For example: a GraphQL API with simple queries (10ms) and complex queries (2 seconds) — LOR will distribute appropriately.
Well-suited when target instances are heterogeneous (mix of large and small instances in the target group)
Useful for WebSocket or long-lived connections — where some connections may “tie up” a target for a long time

Disadvantages:

ALB must track in-flight count → slightly more resource-intensive than round robin (but negligible)
Not suitable with sticky sessions — because sticky sessions override LOR’s decisions

Use when: Requests have high response time variance (mix of fast/slow), API with many different endpoints, microservices with uneven workloads.

7.3. Weighted Random with Anomaly Mitigation — “Weighted Probability + Anomaly Detection”

This is the newest algorithm (AWS announced late 2023). The name is long but the concept has 2 parts:

Part 1: Weighted Random

Instead of choosing by order (Round Robin) or by counter (LOR), ALB selects a target randomly with weights. If all targets have equal weight → it’s pure random. If weights differ → targets with higher weight have a higher probability of being selected.

Part 2: Anomaly Mitigation — Auto-detecting “Silently Failing” Targets

This is the truly interesting part. ALB continuously monitors targets and detects those with “anomalous” behavior compared to the rest of the target group:

Targets with an abnormally high HTTP 5xx rate (but haven’t failed health checks)
Targets with abnormally high connection errors
Targets with abnormally long response times

When an anomaly is detected, ALB automatically reduces the target’s weight (sends less traffic) without waiting for the health check to fail. This is a form of automatic circuit breaker at the load balancer level.

Advantages:

Protects end users from “sneaky” failures — a target isn’t fully broken (still passes health checks) but is returning errors
Automatically recovers when the target returns to normal
Especially useful in continuous deployment environments — if a new version has a bug, ALB automatically reduces traffic to the faulty instance without intervention

Disadvantages:

Can produce “false positives” in some cases (a target processing legitimate but heavy requests → gets traffic reduced unfairly)
Harder to debug — you see uneven traffic distribution and must understand that ALB is “punishing” a particular target

Use when: Production-critical workloads needing high resilience, environments with continuous deployment, API services where you want to auto-detect bad deployments.

Quick Comparison Table

Algorithm	Target selection logic	When to use	Avoid when
Round Robin	Sequential rotation	Uniform requests, uniform targets	Requests have high response time variance
Least Outstanding Requests	Target with fewest in-flight requests	Uneven workloads, microservices	Mandatory sticky sessions
Weighted Random + Anomaly Mitigation	Weighted random, auto-reduces weight for faulty targets	Production-critical, auto-resilience needed	Need deterministic distribution, easy debugging

Where Does Sticky Session Fit?

Sticky session (session affinity) is not an algorithm — it’s an override layer sitting on top of the algorithms above. How it works:

First request: ALB uses the algorithm (Round Robin / LOR / Weighted Random) to select target X
ALB sets a AWSALB cookie in the response, encoding target X
Subsequent requests from the same client: ALB reads the cookie → forwards directly to target X (bypassing the algorithm)

This is useful for stateful applications (shopping carts, session-based auth not using JWT). But the trade-off is losing load balancing effectiveness — if target X goes down, the client only gets rerouted then.

Tip: If your app is stateless (uses JWT, sessions stored in Redis), don’t enable sticky sessions — it only reduces load balancing effectiveness.

Conclusion — Production Checklist

ALB isn’t “one load balancer” — it’s a fleet of nodes that auto-scales to handle traffic. Understanding the underlying architecture helps you avoid many common production mistakes.

Checklist when deploying ALB:

ALB subnets use /27 or larger, ensuring at least 8 free IPs per AZ
Don’t hardcode IPs for ALB — always use the DNS name or Route 53 alias
Set backend keep-alive greater than ALB idle timeout (default 60s) to avoid 502
Use LCU Reservation before planned traffic events (Flash Sale, launch)
Monitor SurgeQueueLength and SpilloverCount — if SpilloverCount > 0, the backend is overloaded
Need a static IP? Place an NLB in front of ALB instead of trying to attach an IP to ALB
Choose the right load balancing algorithm: uniform workloads use Round Robin, uneven workloads use LOR, production-critical workloads enable Weighted Random + Anomaly Mitigation