Consistent hashing is a data distribution algorithm commonly used in distributed systems, particularly for load balancing and caching. It maps both data and server nodes onto a virtual hash ring to minimize the impact of server changes on the overall system.
The core idea of consistent hashing is to place all server nodes on a hash ring using a hash function. Each piece of data is also hashed and placed on the ring, and it is assigned to the next server node in a clockwise direction.
A hash function maps keys and nodes to integer values, usually 32-bit or 64-bit. Common hash functions include crc32, md5, and sha1. This function plays a key role in the even distribution of data.
Each physical server can be represented by multiple virtual nodes on the hash ring. Virtual nodes help achieve a more balanced data distribution and alleviate hotspot issues caused by uneven node placement.
When data needs to be stored, the system hashes the key and looks for the first node on the ring with a hash greater than or equal to the data’s hash. If no such node is found, it wraps around to the first node. This mechanism ensures minimal data movement during scaling operations.
Below is a simple PHP implementation of a consistent hashing algorithm that includes node addition, removal, and data assignment based on keys:
class ConsistentHash
{
private $nodes = array();
private $position = array();
public function addNode($node)
{
$this->nodes[$node] = true;
$this->updatePosition();
}
public function removeNode($node)
{
unset($this->nodes[$node]);
$this->updatePosition();
}
public function getNode($key)
{
if (empty($this->nodes)) {
return null;
}
$pos = $this->hash($key);
foreach ($this->position as $node => $position) {
if ($pos <= $position) {
return $node;
}
}
return reset($this->position);
}
private function updatePosition()
{
$positions = array();
foreach ($this->nodes as $node => $value) {
for ($i = 0; $i < 3; $i++) {
$positions[$node . '_' . $i] = $this->hash($node . '_' . $i);
}
}
asort($positions);
$this->position = $positions;
}
private function hash($str)
{
return crc32($str);
}
}
// Usage Example
$hash = new ConsistentHash();
$hash->addNode('Server1');
$hash->addNode('Server2');
$hash->addNode('Server3');
$server = $hash->getNode('data123');
echo $server; // Output might be Server1
This example demonstrates a basic `ConsistentHash` class that manages server nodes and maps data to appropriate nodes based on the consistent hashing algorithm. The use of virtual nodes helps ensure more balanced distribution across servers.
Consistent hashing is a proven strategy for load balancing in distributed systems, especially when nodes are frequently added or removed. This article introduced the theory and provided a practical PHP implementation to help developers understand and apply this technique. Incorporating consistent hashing can significantly improve a system’s scalability and stability.