Understanding Consistent Hashing in Go: Implementation and Concepts

Have you ever wondered how distributed systems like Cassandra or Redis efficiently handle data distribution across thousands of servers?

I recently faced this challenge while building a distributed object storage system, and the solution led me down the path of consistent hashing.

My goal was to create a gateway that could consistently and statelessly distribute data across multiple servers.

This blog post will delve into the topic of consistent hashing, and how it helped me to solve the challenge at hand.

Understanding the Challenge

Before diving into consistent hashing, let's understand why traditional approaches fall short.

Imagine you're building a distributed cache or storage system that needs to:

Distribute data uniformly across servers
Minimize data redistribution when servers are added or removed
Make routing decisions without maintaining state
Handle server failures gracefully

At first, I looked into simple modulo hashing. It was OK for a use-case where the number of servers remains static, but that's rarely the case, is it? Let's see why.

1. Traditional Hash Distribution

A hash map, at its core, is a mathematical formula that will always give you the same output, for the same input, regardless of what happened before, or comes after.

In the context of a distributed system, when we want to equally distribute data among N amount of servers, this nature becomes really important, because when a request comes in, the hash function doesn't need to:

Remember previous requests
Know about other requests
Keep track of which server handles what
Store any historical information

So how would a hash map of 3 servers look like?

// Example of a hash map with 3 servers:
servers := []string{
    "server1",  // hash: 1000
    "server2",  // hash: 2000
    "server3",  // hash: 3000
}

How would we distribute incoming data between the servers?

Let's say random data comes in, and we calculate its hash. It comes out as 1500. In our distribution function we would direct it to server2 because 2000 is the next highest hash, and so on and so forth.

func getServer(userID string, totalServers int) int {
    hash := crc32.ChecksumIEEE([]byte(userID))
    return hash % totalServers
    // "randomData" will ALWAYS map to the same server number
    // as long as totalServers stays the same
}

The Problem

With this approach, if you have 3 servers and add a fourth one, roughly 80% of your data needs to be redistributed!

This is because the modulo operation creates a completely different mapping when the divisor (number of servers) changes.

2. Enter Consistent Hashing

Consistent hashing solves this problem by creating a circular keyspace (often called a "ring") where both servers and data keys are mapped.

Here's how it works:

Create a fixed-size hash space (typically 0 to 2^32 - 1)
Map servers to multiple points on this circle using a hash function
For each piece of data:

Hash its key to get a position on the circle
Move clockwise until you find the first server
That server becomes responsible for the data

A visual representation of a hash ring can be seen below:

Hash Ring source: acodersjourney

Here's a basic implementation:

type Ring struct {
    servers    map[uint32]string  // Map hash positions to server names
    positions  []uint32           // Sorted list of hash positions
}

func NewRing() *Ring {
    return &Ring{
        servers:    make(map[uint32]string),
        positions:  make([]uint32, 0),
    }
}

func (r *Ring) AddServer(server string, virtualNodes int) {
    // Add multiple virtual nodes for each server to improve distribution
    for i := 0; i < virtualNodes; i++ {
        hash := crc32.ChecksumIEEE([]byte(fmt.Sprintf("%s-%d", server, i)))
        r.servers[hash] = server
        r.positions = append(r.positions, hash)
    }
    sort.Slice(r.positions, func(i, j int) bool {
        return r.positions[i] < r.positions[j]
    })
}

3. Virtual Nodes for Better Distribution

Imagine you have just 4 physical servers, and each one gets mapped to just one point on the hash ring:

Ring position:   0 -------- 1000 -------- 2000 -------- 3000 -------- 4000
Server:          S1          S2            S3            S4            (back to S1)

This creates two issues:

Uneven Distribution: If your hash function isn't perfect (they never are), you might end up with large gaps between servers. A server responsible for a larger arc of the ring gets more data.
Hot Spots: If a server fails, its neighbor has to take all its load, potentially doubling that server's responsibility.

The solution: Virtual Nodes

Instead of mapping each server to one point, we map it to multiple points (often 100-200 points per server):

Ring position:   0 -- 100 -- 200 -- 300 -- 400 -- 500 -- ...
Server:          S1    S3     S2     S1     S4     S2    ...

Here's how it works in practice:

// Without virtual nodes
ring.AddServer("server1", 1)  // Adds server1 at one position

// With virtual nodes (e.g., 100 virtual nodes)
ring.AddServer("server1", 100)  // Adds server1 at 100 different positions

The benefits are:

Better Distribution: Each server is responsible for many small sections of the ring rather than one large section. This averages out any non-uniformity in the hash function.

// Example showing how virtual nodes are created
func (r *Ring) AddServer(server string, virtualNodes int) {
    for i := 0; i < virtualNodes; i++ {
        // Create a unique hash for each virtual node
        vnodeName := fmt.Sprintf("%s-vnode-%d", server, i)
        hash := crc32.ChecksumIEEE([]byte(vnodeName))

        // Map this hash position to the original server
        r.servers[hash] = server
        r.positions = append(r.positions, hash)
    }

    // Keep positions sorted for efficient lookup
    sort.Slice(r.positions, func(i, j int) bool {
        return r.positions[i] < r.positions[j]
    })
}

Better Failure Handling: If a server fails, its load is distributed across all remaining servers rather than just one neighbor:

// Example of load distribution when a server fails
func (r *Ring) RemoveServer(server string, virtualNodes int) {
    for i := 0; i < virtualNodes; i++ {
        vnodeName := fmt.Sprintf("%s-vnode-%d", server, i)
        hash := crc32.ChecksumIEEE([]byte(vnodeName))

        // Remove all virtual nodes for this server
        delete(r.servers, hash)
        // Remove from positions slice (simplified)
        // In practice, you'd use a more efficient removal method
    }
}

Configurable Granularity: You can adjust the number of virtual nodes based on server capacity:

// More powerful servers can handle more virtual nodes
ring.AddServer("powerful-server", 200)  // 200 virtual nodes
ring.AddServer("weaker-server", 50)     // 50 virtual nodes

The beauty of this approach is that when you add or remove a server, only the keys that were mapped to that server's positions need to be redistributed. On average, if you have N servers and add one more, only 1/N of the keys need to move – much better than the ~80% with traditional hashing :)

Real-World Considerations

While the basic concept is elegant, production systems need to consider several additional factors:

Replication: Store data on multiple consecutive servers for redundancy
Weight: Some servers might have more capacity than others
Failure Detection: Handle temporary server outages gracefully
Consistency: Ensure all nodes agree on the ring state

Conclusion

Consistent hashing is a powerful technique that makes distributed systems more scalable and resilient.

While the basic concept is straightforward, the real power comes from understanding how to apply it to solve real-world problems. The implementation I've explored here is just the beginning.

Production systems often need to consider factors like:

Network topology awareness
Weighted distribution based on server capacity
Active health checking and automatic failover
Strong consistency guarantees