Design Strategies To Handle Bursty Traffic In Your Service

4 min readOct 3, 2021

The way a microservice handles traffic should be scalable, which means it should be prepared for drastic changes in traffic, especially uneven bursts of traffic, handle them carefully, and prevent them from taking down the service entirely. This blog discusses about the problems & possible design solutions to handle bursty/spiky traffic.

Problems With Bursty Traffic

Bursty traffic in short time span overloads services and causes following issues:

1) Increase in call response latency

2) Increase in dependency call rate more than supported rate

3) Resource over-utilization resulting in system crash

Strategies To Smoothen Bursty Traffic

In order to protect service resources and control dependencies access rate, you need to set up a system that has both proactive and reactive strategies to prepare your service for traffic surges based on various use-cases.

1) Throttling

When clients requests increase more than supported access rate, you can throttle client requests. It provides clients opportunity to rate limit or spread call traffic by delaying further requests after getting throttled.

2) Rate Limiting

While making access to resources or dependency service, you should identify the access rate before making calls. It allows to limit the rate of calls to downstream services in order to avoid being throttled or starve other clients of resources.

3) Backoff Retry Mechanism

The requests to dependencies should be retried after some delay when dependency throttles requests. The retry mechanism should add jitter while retrying so that retry calls can be spread out. In cases where dependencies don’t have throttling mechanism and traffic from your service is spiky, requests should be retried once access rate decreases than supported rate.

4) Monitoring

Monitor metrics whenever client calls are throttled: ClientId , throttle count etc. It helps in identifying client’s call throttle pattern and scaling needs.
Monitor metrics when access rate of dependencies reaches more than supported rate: calling service name, dependency called, call count etc. It helps in demanding increase in supported access rate of dependency.

5) Forecasting Traffic & Load Testing

Measure first, and then optimize. Forecasting traffic using historical data helps identify atypical traffic bursts so that you are prepared to cope with traffic surge in future. Load testing your service helps understand the amount of load/traffic your system can handle and how easy/fast you can acquire and release additional resources.

6) Serverless Infrastructure

One of the proactive strategies to handle traffic of any nature is to use flexible, scalable and highly available serverless cloud infrastructure like AWS Lambda, AWS Fargate etc. which scale automatically according to your traffic needs. Such solutions may not be cost-efficient way to deal with a potential traffic surge. However, usually, they do provide cost optimizations which means that you pay for the resources you need at the exact moment.

7) Load Shedding

When a service approaches overload, it can start rejecting excess requests so that it can focus on the requests it decides to let in. The goal of load shedding is to keep latency low for the requests that the service decides to accept so that the service responds before the client times out. One of the mechanism to decide which requests to accept is Priority Processing where requests marked with higher priority are considered first.

With this approach, the service maintains high availability for the requests it accepts, and only the excess traffic’s availability is affected. However, the caveat is it requires mechanisms to detect whether the countermeasures are rejecting any significant volume of traffic.

8) Using Asynchronous Mechanisms

Instead of synchronously processing of requests, asynchronous mechanisms are better equipped to handle bursty traffic patterns as the latter allows to control and smoothen traffic. However, this also means you have to handle the failure modes and idempotency issues that comes along with an asynchronous system.

For example, in case your database is not designed to handle spiky writes, you need some other mechanism in front of it to smoothen the TPS like store the record in temporary memory and asynchronously sync back to your database in batches. This gives control over your writes per second assuming you can handle slightly delayed writes.

Thank you for reading! If you found this helpful, here are some next steps you can take:

This blog is part of my System Design Choices series. Check out the other blogs of the series.
Send some claps my way, follow me on Medium & subscribe below to get a notification whenever I publish.
Connect with me on LinkedIn & Twitter for more tech blogs!