Interesting Query (1) --- Find Second Highest Salary

Extend the article: SQL - Interesting Queries as a series of articles (12/22/2022):

Introduction

SQL, the Structural Query Language, seems simple, but often with some tricks in them. This article will discuss some interesting, and maybe complex queries, and list the regular ones as references at the bottom.

The content of this article:

  • Introduction
  • Questions
    • Question 1
    • Question 2
  • Setup Table and Initial Data
  • Answers
    1. for Question 1
      1. Answer 1-1
      2. Answer 1-2
      3. Answer 2-3
      4. Answer 1-4
    2. for Question 2
  • Summary

Questions

  • Question 1
    • Query to find Second Highest Salary of Employee? 
  • Question 2
    • Find and Remove Duplicate Rows from a Database Table?

Set up Table and initial Data

Set up Test Database Environment

USE [TestDB]  
GO  
/****** Object: Table [dbo].[Employees] Script Date: 5/28/2021 7:42:51 PM ******/  
SET ANSI_NULLS ON  
GO  
SET QUOTED_IDENTIFIER ON  
GO  
CREATE TABLE [dbo].[Employees](  
[Id] [int] IDENTITY(1,1) NOT NULL,  
[Name] [nvarchar](max) NOT NULL,  
[Salary] [int] NOT NULL,  
[ManagerId] [int] NOT NULL,  
CONSTRAINT [PK_dbo.Employees] PRIMARY KEY CLUSTERED  
(  
   [Id] ASC  
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]  
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]  
GO

Insert initial Data

insert dbo.Employees  
Values ('Greg', 100000, 1),  
       ('George', 150000, 1),  
       ('Helen', 130000, 1),  
       ('Tom', 120000, 2),  
       ('Kevin', 110000, 2),  
       ('David', 120000, 3),  
       ('Geek', 110000,3),  
       ('Tesla', 120000,3),
       ('David', 120000,3),
       ('Kevin', 110000,2)

The result will be,

Ansers

There are two questions.

Answer for Question 1

Query to find Second Highest Salary of Employee?

This is a very popular SQL question, one can find out it on different sites with multiple solutions.

Answer 1-1

SELECT MAX(salary) [Second Highest Salary]  
FROM employees  
WHERE salary NOT IN (SELECT MAX(salary) FROM employees)  

Result

This solution is most effiecient and is the best solution. It is short and easy to understand.

We use the SUBQUERY in the WHERE Clause. The solution is straight forward,

  • SubQuery (Inner Query in the graph) SELECT the MAX salary from the employees table.
  • Main Query (Outer Query in the graph) SELECT the MAX salary from the employees table without the MAX salary.

In this article, How to find Second Highest salary in SQL?, it gives more than 10 ways for this question, but we are not interested in getting the same solution with more complex logic.

There ARE some equivalent solutions to what we have in Answer 1, and some with advantages for the extended question such as find Third (Nth) Highest Salary of Employee?

Answer 1-2,

SELECT MAX(salary) [Second Highest Salary]  
FROM employees  
WHERE salary < (SELECT MAX(salary) FROM employees)  

This is exactly equivalent to Answer 1.

Answer 1-3,

SELECT MAX(salary) [Second Highest Salary]  
FROM employees  
WHERE salary NOT IN (SELECT DISTINCT TOP 1 salary   
                     FROM employees  
                     ORDER BY salary DESC)  

The logic of this solution is similar to Answer 1, but SubQuery uses SELECT TOP Clause.  The advantage for this is that if we replace 1 in SubQuery by 2, we can get the solution for finding the Third Highest Salary of Employee, such as

SELECT MAX(salary)  
FROM employees  
WHERE salary NOT IN (SELECT DISTINCT TOP 2 salary   
                     FROM employees  
                     ORDER BY salary DESC)  

For which, if we try to use Answer 1 to get the solution, it is possible, but a bit complex,

SELECT MAX(salary) [Second Highest Salary]  
FROM employees  
WHERE salary NOT IN (  
                    SELECT MAX(salary)   
                    FROM employees  
                    )  
AND salary NOT IN (  
                    SELECT MAX(salary)   
                    FROM employees  
                    WHERE salary NOT IN (  
                                        SELECT MAX(salary)   
                                        FROM employees  
                                        )  
                    )

where,

  • NOT IN the first SubQuery --- get rid of the top max value as Answer 1 did, and 
  • the second SubQuery is just the Answer 1 itself 
  • NOT IN the second SubQuery --- get rid of the second max value

Therefore, we got the third-highest salary, however, with quite complex logic --- this is what we do not like.

Furthermore, if we want the Nth Highest Salary of Employee, it will be hard by Answer 1 or Answer 1-2, but we can easy to get it by Answer 1-3 by replacing 1 by N-1, then we have,

SELECT MAX(salary) [Nth Highest Salary]  
FROM employees  
WHERE salary NOT IN (SELECT DISTINCT TOP (N-1) salary   
                     FROM employees  
                     ORDER BY salary DESC) 

Note (12/20/2022 added), 

we can get the Nth Highest Salary of Employee, by the way below, that is equivalent to the way above (think about the logic):

SELECT MIN(salary) [Nth Highest Salary]  
FROM employees  
WHERE salary     IN (SELECT DISTINCT TOP N salary   
                     FROM employees  
                     ORDER BY salary DESC)  

Answer 1-4,

SELECT DISTINCT salary [Second Highest Salary]   
FROM employees e1   
WHERE 2 = (SELECT COUNT(DISTINCT salary) FROM employees e2   
WHERE e1.salary <= e2.salary);

This is equivalent to Answer 1-3, and also can be used to get Nth Highest Salary of Employee. However, it is a bit hard to understand its logic. Let us discuss it step by step,

First, the Inner Query,

SELECT COUNT(DISTINCT salary) FROM employees 

Return - 5, that is the category of salary:

SELECT salary FROM employees  
GROUP BY salary

Return

Then we add the condition into the Sub Query, and let it == 2,

2 = (SELECT COUNT(DISTINCT salary) FROM employees e2     
WHERE e1.salary <= e2.salary) 

Satisfy the condition for e1.salary from the Major (Outer) Query must be the 130000, the second-highest salary as required.

Answer for Question 2

Find and Remove Duplicate Rows from a Database Table?

Finding Duplicate Rows is relatively easy to do while locating the positions for the Duplicate Rows, and therefore Removing them is a bit difficult.

Finding Duplicate Rows

SELECT name, salary, managerid, COUNT(salary) Records 
FROM employees  
GROUP BY name, salary, managerid  
HAVING COUNT(salary) > 1 

The result,

We group all records except the identity or primary key that must be different for a different records. 

Removing Duplicate Rows with an identity[ref]

We first need to locate the position of the duplicate records, which is easier when the table has an identity on it.

Use the following query to select the duplicate records,

SELECT *   
FROM employees   
WHERE Id NOT IN (SELECT MIN(Id)   
FROM employees  
GROUP BY name, salary, ManagerId) 

Those are,

Then Delete them,

DELETE  
FROM employees   
WHERE Id NOT IN (SELECT MIN(Id)   
FROM employees  
GROUP BY name, salary, ManagerId)  

Result

Records 9, 10 are deleted,

Removing Duplicate Rows withOUT an identity --- Oracle[ref]

For the Oracle database, it is easier because Oracle has a ROWID pseudo column for a table, then we can do the job as.

Locate the duplicate Records,

SELECT *  
FROM employees  
WHERE ROWID NOT IN (SELECT MIN(ROWID)  
FROM employees  
GROUP BY name, salary, ManagerId)  

Delete the duplicate Records,

DELETE  
FROM employees  
WHERE ROWID NOT IN (SELECT MIN(ROWID)  
FROM employees  
GROUP BY name, salary, ManagerId)  

Removing Duplicate Rows withOUT an identity --- SQL Server

We can use CTE (Common Table Expression) with ROW_NUMBER[ref] function.

Locate the duplicate Records[ref],

;WITH CTE AS  
(  
   SELECT name, salary,managerid,  
   ROW_NUMBER() OVER (PARTITION BY name, salary, managerid 
   Order by name) AS Rnum  
   FROM employees  
)  
SELECT * From CTE  

Result

Delete the duplicate Records: Rnum > 1,

;WITH CTE AS    
(    
   SELECT name, salary,managerid,    
   ROW_NUMBER() OVER (PARTITION BY name, salary, managerid   
   Order by name) AS Rnum    
   FROM employees    
)    
--SELECT * From CTE    
DELETE FROM CTE WHERE Rnum <> 1  

We can also do the job without using CTE[ref],

DELETE DUP    
FROM    
(    
    SELECT ROW_NUMBER() OVER (PARTITION BY name, salary, managerid   
    Order by name) AS Rnum    
    FROM employees    
) DUP    
WHERE DUP.Rnum > 1;   

Summary

We choose two queries at this first time trying,

  • Query to find Second Highest Salary of Employee? 
  • Find and Remove Duplicate Rows from a Database Table?

If I find some other interesting Query, I will add them to this article.

References

Concept

 


Similar Articles