Why Distributed Systems Need Their own Unique ID Generator

Distributed Systems

April 14, 2025

In almost every database table or collection, a unique ID is required to identify each record. This is commonly handled by using either auto-incremented numbers or UUIDs (also known as GUIDs), both of which are usually managed by the database itself.

So… if most databases already take care of unique IDs, why do we need a separate unique ID generator?

That’s what I wondered too — until I dug deeper and found that this seemingly simple topic becomes surprisingly interesting when you scale into distributed systems.

In this post, I’ll walk through what I’ve learned from researching this topic, especially around the limitations of UUIDs and auto-increment IDs in distributed environments, and how Twitter’s Snowflake ID approach provides a clever and scalable solution.

Common Approaches (And Why They Don't Scale Well)

Distributed systems introduce complexity when it comes to generating unique, reliable, and sortable identifiers. Let’s take a look at two typical approaches used in smaller systems — and why they fall short at scale.

1. UUID/ GUID

One common way to generate unique IDs is using random UUIDs (Universally Unique Identifiers). Since a UUID is a 128-bit number, the probability of collision is extremely low — practically negligible. This makes UUIDs great for guaranteeing uniqueness, even across different machines or databases.

But UUIDs have a few downsides:

They’re not time-sortable. UUIDs don’t encode creation time, which makes it impossible to infer the order in which records were created.
Timestamp-based ordering is unreliable. Some systems work around this by using a created_at or updated_at timestamp to order records. But this introduces a whole new set of issues such as timestamp collisions, clock synchronization issues between servers, and the fact that two events may have the same timestamp yet be hard to deterministically order.

Imagine a chat application where you and your friend send messages at exactly the same second. On one day, your message appears first. On another day, the order is reversed. This inconsistency could confuse users and break the user experience.

2. Auto-Increment Sequence

The second common approach is using auto-incrementing IDs. These have a major advantage — they are naturally sortable and simple to use.

But in distributed systems, auto-increment IDs can quickly become problematic. If multiple nodes are writing to their own databases independently, you’re almost guaranteed to end up with duplicate IDs across the system.

A typical workaround is to introduce a central ticket server — a single source of truth for generating sequential IDs across the cluster.

This works, but comes with trade-offs:

Single point of failure: If the ticket server goes down, every service that depends on it is affected.
Scalability bottleneck: As the system grows, the ticket server can become a performance bottleneck.
State loss risk: If the ticket server restarts without proper persistence, you risk duplicate IDs or broken sequences.

Twitter's Snowflake ID

To solve this problem at scale, Twitter (currently known as X) introduced a solution called Snowflake in 2010 — a 64-bit, time-sortable, unique ID generator designed specifically for distributed systems. Many companies and platforms (like Discord, Instagram, and even MongoDB’s ObjectId) use a variation of this concept.

Each Snowflake ID contains key components packed into a single 64-bit number:

41 bits for a timestamp (in milliseconds) since a custom epoch
10 bits for a machine or node ID (supports up to 1024 machines)
12 bits for a per-machine sequence number (up to 4096 IDs per millisecond)

Advantages:

Globally unique: No two machines will generate the same ID at the same time.
Sortable: Because the timestamp comes first, Snowflake IDs can be sorted chronologically.
Scalable: Each machine can generate thousands of IDs per millisecond, without needing coordination.
Fault-tolerant: Machines can generate IDs independently — no central server required.

Final Thoughts

This was one of those topics that seemed simple on the surface but became more and more interesting the deeper I went. It reminded me how much thought and engineering goes into things we often take for granted — like the ID of a tweet or chat message.

If you're working on a system that requires unique, sortable IDs across distributed services, I'd highly recommend looking into Snowflake-style generators. It’s a small component with a big impact on consistency, scalability, and user experience.

Sources & Credits

System Design Interview by Alex Xu
Designing Data-Intensive Applications by Martin Kleppmann
Wikipedia - Snowflake ID

Originally published at https://withzell.com.