System design basics - Arthur Takeda

System design whether for your next interview or for you day job is a great skill to have.

In this article I'll go over the basics that will help you lead your next design session.

Requirements

First, you must drill down on the requirements for the design. Here are some common questions to keep in mind:

How many users is it expected to have?
What will this service do?
Are users spread out in many regions of the world?
What is the expected load? Thousands, hundreds of thousands, million, billion requests a day?

Horizontal Scaling

Horizontal scaling is a method of adding more machines or setting up a cluster of machines to handle increased load. It's different than vertical scaling, which involves adding more raw power (CPU, RAM) to a machine.

In the context of system design, horizontal scaling is more commonly used because it provides a way to increase capacity on the fly and is generally more flexible than vertical scaling. It allows for the distribution of load across multiple servers, which reduces the load on any single server and increasing the reliability and availability of the application.

Benefits of Horizontal Scaling

Scalability: It's easier to scale out to handle more traffic by simply adding more machines into the pool.
Fault Tolerance: If one server fails, the load can be redistributed to the remaining servers, minimizing downtime, while also spinning up new ones as needed.
Load Distribution: It allows for the distribution of load, which can help in optimizing resource usage.

Challenges of Horizontal Scaling

Complexity: Managing multiple servers and ensuring they work together seamlessly can be complex.
Data Consistency: Ensuring data consistency across servers can be challenging, especially in databases.
Networking Overhead: Increased communication between servers can lead to networking overhead.

Strategies for Horizontal Scaling

Stateless Applications: Designing applications in a stateless way where each request could be handled by any server available without needing context of previous interactions.
Session Management: Using centralized session management solutions that can be accessed by all servers.
Load Balancing: Distribute traffic evenly across servers.
Database Sharding: Splitting databases into smaller, more manageable pieces, or shards, that can be spread across multiple servers.

Load balancing

Imagine that you have millions of customers using your product at the same time. You probably shouldn't route the requests to just one server.

You should horizontally scale your servers. Cool, now that you have more than server the problem is now for which server should you redirect the requests to, and how to balance them in real time?

A Load Balancer is a common solution to this problem. It acts as the only point of contact for clients. Once it receives traffic, it distributes it to healthy servers that will handle the traffic from there.

It will not only redirect the requests but it will also keep track of important data about the server, like how much resources are being used on each server, and use it to know where should it route traffic to.

The goal of a load balancer is to avoid overloading an instance or server with too much traffic while other healthy instances are not being used as much.

You can use different algorithms to route the traffic.

Round Robin

Imagine you have 3 instances of a service, the first incoming request will hit the LB which will route the request to the instance 1, the second request will go to instance 2, the third to instance 3 and the fourth back to instance 1 and so on.

But it's kind of a naive approach most of the times because there are other things to keep track of like the number of active sessions.

Least Connections

The Least Connections algorithm directs traffic to the server with the least number of active sessions.

If your server has different session lengths, it helps to prevent any single server from becoming overloaded with multiple ongoing sessions.

IP Hash (Sticky Session)

The IP Hash or Stick Session algorithm assigns a unique hash key to each client's IP address and then allocates requests based on this key.

This will make that client's requests to be always routed to the same server, which can be useful for session persistence.

Weighted Algorithms

Both Round Robin and Least Connections algorithms can be modified to become weighted. In Weighted Round Robin or Weighted Least Connections, each server is assigned a weight based on its capacity or performance. Servers with higher weights receive more connections or requests than those with lower weights.

Least Time

The Least Time algorithm sends requests to the server that responds the fastest and has the least number of active connections.

No load balancing strategy is one-size-fits-all, you always should dig into your system's unique requirements and constraints to figure out the best strategy.