Union-Find: A Simple Guide
What is Union-Find?
If you’re diving into algorithms or data structures, you’ll bump into the Union-Find (also known as Disjoint-Set Union or DSU) data structure. It’s not the most glamorous topic, but it’s incredibly useful for solving certain problems efficiently. Think of it as a way to keep track of a collection of items that can be grouped together.
At its core, Union-Find manages a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It supports two primary operations:
find(i): Determine which subset an elementibelongs to. This operation returns a representative (or “leader”) of the set thatiis in.union(i, j): Merge the two subsets containing elementsiandjinto a single subset.
Why Use It?
Union-Find shines when you need to answer questions like “Are these two items connected?” or “How many separate groups are there?”. Common use cases include:
- Kruskal’s Algorithm: For finding the Minimum Spanning Tree (MST) of a graph. You use Union-Find to ensure you don’t create cycles when adding edges.
- Detecting Cycles in Graphs: Similar to Kruskal’s, if you try to
uniontwo nodes that are already in the same set, you’ve found a cycle. - Connected Components: Finding groups of connected nodes in a graph.
- Image Processing: Segmenting regions in an image.
How it Works: The Basic Idea
The most straightforward way to implement Union-Find is using an array where each index represents an element. We can think of this array as representing a forest of trees, where each tree is a subset. The root of each tree is the representative of that subset.
We’ll use an array, let’s call it parent, where parent[i] stores the parent of element i. If parent[i] == i, then i is the root of its tree (and thus the representative of its set).
The find Operation
To find the representative of an element i, we just traverse up the tree by following the parent pointers until we reach the root (an element whose parent is itself).
function find(parent, i) { if (parent[i] === i) { return i; } return find(parent, parent[i]);}The union Operation
To union two sets containing elements i and j, we first find their respective representatives, say rootI and rootJ. If rootI and rootJ are different, it means i and j are in different sets. We then make one root the parent of the other, effectively merging the trees.
function union(parent, i, j) { const rootI = find(parent, i); const rootJ = find(parent, j);
if (rootI !== rootJ) { parent[rootI] = rootJ; // Make rootJ the parent of rootI }}Initialization
Initially, each element is in its own set. So, for n elements, we initialize the parent array such that parent[i] = i for all i from 0 to n-1.
function initialize(n) { const parent = new Array(n); for (let i = 0; i < n; i++) { parent[i] = i; } return parent;}Optimizations: Path Compression and Union by Rank/Size
The basic implementation can be slow if the trees become very tall and skinny (like a linked list). To make it efficient, we use two optimizations:
-
Path Compression: When we call
find(i), after finding the root, we make every node on the path fromito the root point directly to the root. This flattens the tree.function find(parent, i) {if (parent[i] === i) {return i;}// Path compressionparent[i] = find(parent, parent[i]);return parent[i];} -
Union by Rank (or Size): When merging two trees, we attach the shorter tree to the root of the taller tree. This helps keep the trees balanced. We can track the “rank” (an approximation of height) or “size” (number of nodes) of each tree. If ranks/sizes are equal, pick one and increment its rank/size.
For simplicity, let’s use size.
// Need a size array, initialized to 1 for each element// let size = new Array(n).fill(1);function union(parent, size, i, j) {const rootI = find(parent, i);const rootJ = find(parent, j);if (rootI !== rootJ) {if (size[rootI] < size[rootJ]) {parent[rootI] = rootJ;size[rootJ] += size[rootI];} else {parent[rootJ] = rootI;size[rootI] += size[rootJ];}}}
With these optimizations, the amortized time complexity for both find and union operations becomes nearly constant, specifically O(α(n)), where α is the inverse Ackermann function. This function grows incredibly slowly, so for all practical purposes, it’s considered constant.
Putting It All Together (Optimized)
Here’s a class encapsulating the optimized Union-Find structure.
class UnionFind { constructor(n) { this.parent = Array(n).fill(0).map((_, i) => i); this.size = Array(n).fill(1); this.count = n; // Number of disjoint sets }
find(i) { if (this.parent[i] === i) { return i; } // Path compression this.parent[i] = this.find(this.parent[i]); return this.parent[i]; }
union(i, j) { const rootI = this.find(i); const rootJ = this.find(j);
if (rootI !== rootJ) { // Union by size if (this.size[rootI] < this.size[rootJ]) { this.parent[rootI] = rootJ; this.size[rootJ] += this.size[rootI]; } else { this.parent[rootJ] = rootI; this.size[rootI] += this.size[rootJ]; } this.count--; return true; // Union occurred } return false; // Already in the same set }
connected(i, j) { return this.find(i) === this.find(j); }
getSetCount() { return this.count; }}
// Example Usage:// const uf = new UnionFind(10);// uf.union(1, 2);// uf.union(3, 4);// uf.union(1, 4);// console.log(uf.connected(2, 3)); // true// console.log(uf.getSetCount()); // 7Conclusion
The Union-Find data structure is a powerful tool for managing disjoint sets. While the basic concept is simple, the optimizations like path compression and union by rank/size make it remarkably efficient. Understanding and implementing Union-Find will definitely help you tackle a variety of algorithmic challenges.