Introduction
In this article, we'll explore the RANK() and DENSE_RANK() functions and their differences using examples. When working with data in SQL, we often need to assign rankings to rows based on certain criteria. Two commonly used window functions for this purpose are RANK() and DENSE_RANK(). While they may seem similar at first time, there are important differences between them. Let's first learn about what are window functions.
What are Window Functions?
Before looking into RANK() and DENSE_RANK(), let's learn about window functions. Window functions in SQL perform calculations across a set of table rows that are somehow related to the current row. They allow you to do things like running totals, rankings, or moving averages without the need for complex self-joins.
RANK() Function
The RANK() function assigns a rank to each row within a partition of a result set. If there are ties (rows with equal values), they receive the same rank, and the next ranks are skipped.
CREATE TABLE employees (
id INT,
name VARCHAR(50),
salary INT
);
INSERT INTO employees VALUES
(1, 'Aditya', 50000),
(2, 'Bikesh', 60000),
(3, 'Chandan', 60000),
(4, 'Dravid', 70000),
(5, 'Loki', 80000);
SELECT name, salary, RANK() OVER (ORDER BY salary DESC) as rank
FROM employees;
Output:
name | salary | rank
---------|--------|-----
Loki | 80000 | 1
Dravid | 70000 | 2
Bikesh | 60000 | 3
Chandan | 60000 | 3
Aditya | 50000 | 5
In the above example, you Notice that Bikesh and Chandan have the same rank (3) because they have the same salary. The next rank is 5 and rank 4 is skipped.
DENSE_RANK() Function
DENSE_RANK() is similar to RANK(), but it doesn't skip any ranks when there are ties. This means that if two rows have the same value and thus the same rank, the next distinct value will get the next consecutive rank.
SELECT name, salary, DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
FROM employees;
Output:
name | salary | dense_rank
-------- |--------|----------
Loki | 80000 | 1
Dravid | 70000 | 2
Bikesh | 60000 | 3
Chandan | 60000 | 3
Aditya | 50000 | 4
In the above example, Bikesh and Chandan still share rank 3, but Aditya gets rank 4 instead of 5.
Differences Between RANK() and DENSE_RANK().
1. Ranking
- RANK() leaves gaps in the ranking when there are ties.
- DENSE_RANK() does not leave gaps; it always returns consecutive ranks.
Use Cases
- Use RANK() when you want to reflect the true position, including ties and gaps.
- Use DENSE_RANK() when you want to assign ranks without gaps, which can be useful for things like calculating percentiles or determining top N items without skipping ranks.
2. Unique Ranks
- With RANK(), the number of unique ranks will be less than or equal to the number of rows.
- With DENSE_RANK(), the number of unique ranks can be significantly less than the number of rows if there are many ties
Summary
Both RANK() and DENSE_RANK() are powerful tools for data analysis, each with its own use cases. RANK() gives you a true competitive ranking with possible gaps, while DENSE_RANK() provides a denser ranking without skipping values. Understanding the differences between these two functions can help you choose the right one for your needs, leading to more accurate and meaningful data insights. Remember, the choice between RANK() and DENSE_RANK() often comes down to how you want to handle ties and whether gaps in the ranking are acceptable or even desirable for your particular analysis.