Redis  

Redis Cluster Architecture Explained: How Redis Scales Horizontally in Production

Redis Cluster Architecture

Introduction

Redis Cluster is one of those topics many teams postpone until they are forced to deal with it. Everything works well on a single Redis node at first. Performance is excellent, latency is low, and the system feels simple and predictable.

As traffic grows, data volume increases, and memory fills up, a single Redis instance eventually hits its limits. At that point, the question appears late in the project: how do we scale Redis?

Redis Cluster is the answer, but it is not simply "Redis but bigger." Redis Cluster fundamentally changes how keys are stored, how operations behave, how failures are handled, and how applications must be designed.

What Redis Cluster Is Trying to Solve

Redis Cluster exists to address two fundamental limitations of a single Redis instance: memory limits and CPU limits.

A single Redis node can only use the memory of one machine, and it executes most commands on a single main thread. No matter how fast Redis is, these constraints eventually become a hard ceiling.

Redis Cluster solves this problem by sharding data across multiple Redis nodes. Instead of one Redis instance holding all keys, the dataset is split so that each node stores and processes only a portion of the total key space.

This approach provides more total memory, higher overall throughput, and built-in fault tolerance through replication. However, it also introduces additional complexity that teams must be prepared to manage.

The Core Idea: Hash Slots

Redis Cluster uses a fixed number of hash slots to distribute data. There are exactly 16,384 hash slots in a Redis Cluster.

Every Redis key is mapped to one hash slot, and each hash slot is assigned to exactly one master node. When a client issues a command, Redis calculates the hash slot for the key and routes the request to the node responsible for that slot.

This design allows Redis to move hash slots between nodes during rebalancing without changing how keys are hashed, making online scaling possible.

Masters, Replicas, and Failover

In a Redis Cluster, each shard is represented by a master node. Each master typically has one or more replica nodes that continuously copy data from the master.

Master nodes handle both read and write operations, while replicas exist to provide redundancy. If a master node fails, one of its replicas is automatically promoted to become the new master.

Failover in Redis Cluster is automatic but not instantaneous. During failover, some requests may fail briefly, and applications must be designed to retry operations. Redis Cluster favors availability over strict consistency, which is an important design consideration.

What Redis Cluster Does Well

Redis Cluster excels at horizontal scaling for workloads that are primarily single-key operations. Common examples include simple get and set operations, counters, and user-specific data access.

As additional nodes are added to the cluster, both memory capacity and throughput increase. While hot keys can still cause imbalance, overall system capacity scales effectively for many real-world workloads.

For large systems, Redis Cluster often removes the hard scaling limits that exist with a single Redis instance.

The First Big Surprise: Cross-Key Operations

One of the most common surprises teams encounter with Redis Cluster is the limitation on multi-key operations. Redis Cluster does not support multi-key commands across different hash slots.

Commands such as MGET, MSET, transactions, and Lua scripts only work when all involved keys belong to the same hash slot. If they do not, Redis returns an error.

This behavior is a deliberate design decision. Supporting distributed transactions would add significant complexity and reduce performance, so Redis avoids them entirely.

Hash Tags and Their Trade-Offs

Redis provides hash tags as a way to force multiple keys into the same hash slot. Keys that share the same value inside curly braces are guaranteed to map to the same slot.

This technique allows multi-key operations on related data, such as user-specific keys. However, hash tags must be used carefully.

Overusing hash tags or placing too many keys under the same tag can create hot shards. A single node may become overloaded, which defeats the purpose of horizontal scaling. Hash tags should be applied sparingly and only when necessary.

Data Modeling Changes in Redis Cluster

Redis Cluster requires a different approach to data modeling. Flat key spaces with independent access patterns work well, while highly relational data models do not.

If an application relies heavily on transactions across many keys, Redis Cluster may require significant redesign. Many teams discover this late in the process and are forced to refactor key structures under pressure.

Planning for Redis Cluster early helps avoid painful migrations later.

Client Support Is Critical

Using Redis Cluster requires cluster-aware client libraries. These clients understand hash slots, follow redirections, and adapt to topology changes automatically.

Using a non-cluster-aware client can lead to subtle bugs, degraded performance, or complete failure. Before adopting Redis Cluster, teams should verify that their chosen client libraries fully support cluster features and understand their behavior during failover and resharding.

Rebalancing and Resharding

Redis Cluster supports online resharding, allowing teams to add new nodes and move hash slots without downtime. Slots are migrated gradually, and Redis continues to serve traffic during the process.

During resharding, clients may experience temporary redirections, but availability is maintained. Careful planning and monitoring are essential, as poor resharding strategies can create uneven load or temporary hot spots.

Operational Complexity in Practice

Redis Cluster introduces real operational complexity. Teams must manage multiple nodes, replication, failover, resharding, and client compatibility.

Debugging becomes more challenging, as logs and metrics are spread across multiple machines and issues may affect only part of the cluster. For small systems, this added complexity may outweigh the benefits.

When Redis Cluster Is the Right Choice

Redis Cluster is a strong fit when memory limits of a single node are exceeded, throughput demands grow beyond one CPU core, high availability is required, and access patterns are mostly single-key.

Teams that design their key space with sharding in mind typically succeed with Redis Cluster.

When Redis Cluster Is the Wrong Choice

Redis Cluster may not be appropriate when applications rely heavily on multi-key transactions, require strict consistency guarantees, or use highly relational data models.

In such cases, alternatives such as application-level sharding or different data stores may be more suitable.

Common Redis Cluster Mistakes

Common mistakes include adopting Redis Cluster too early, ignoring hash slot implications, overusing hash tags, assuming multi-key operations will work transparently, and underestimating operational overhead.

These issues appear frequently in real-world incident postmortems.

A Practical Mental Model

A useful way to think about Redis Cluster is as many small Redis instances working together. As long as applications respect the boundaries between shards, Redis Cluster behaves predictably.

Problems arise when teams treat Redis Cluster like a single large Redis instance without constraints.

Summary

Redis Cluster is a powerful scaling tool, but it is not a default setting. It solves real scalability problems while demanding discipline in key design, client choice, and operational practices.

Teams that plan for Redis Cluster early usually adopt it smoothly. Those that add it late often struggle. Even if Redis Cluster is not needed today, designing with cluster compatibility in mind pays off in the future.