Extend the article: SQL - Interesting Queries as a series of articles (12/22/2022):
Introduction
SQL, the Structural Query Language, seems simple, but often with some tricks in them. This article will discuss some interesting, and maybe complex queries, and list the regular ones as references at the bottom.
The content of this article:
- Introduction
- Questions
- Setup Table and Initial Data
- Answers
- for Question 1
- Answer 1-1
- Answer 1-2
- Answer 2-3
- Answer 1-4
- for Question 2
- Summary
Questions
- Question 1
- Query to find Second Highest Salary of Employee?
- Question 2
- Find and Remove Duplicate Rows from a Database Table?
Set up Table and initial Data
Set up Test Database Environment
USE [TestDB]
GO
/****** Object: Table [dbo].[Employees] Script Date: 5/28/2021 7:42:51 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Employees](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NOT NULL,
[Salary] [int] NOT NULL,
[ManagerId] [int] NOT NULL,
CONSTRAINT [PK_dbo.Employees] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
Insert initial Data
insert dbo.Employees
Values ('Greg', 100000, 1),
('George', 150000, 1),
('Helen', 130000, 1),
('Tom', 120000, 2),
('Kevin', 110000, 2),
('David', 120000, 3),
('Geek', 110000,3),
('Tesla', 120000,3),
('David', 120000,3),
('Kevin', 110000,2)
The result will be,
Ansers
There are two questions.
Answer for Question 1
Query to find Second Highest Salary of Employee?
This is a very popular SQL question, one can find out it on different sites with multiple solutions.
Answer 1-1
SELECT MAX(salary) [Second Highest Salary]
FROM employees
WHERE salary NOT IN (SELECT MAX(salary) FROM employees)
Result
This solution is most effiecient and is the best solution. It is short and easy to understand.
We use the SUBQUERY in the WHERE Clause. The solution is straight forward,
- SubQuery (Inner Query in the graph) SELECT the MAX salary from the employees table.
- Main Query (Outer Query in the graph) SELECT the MAX salary from the employees table without the MAX salary.
In this article, How to find Second Highest salary in SQL?, it gives more than 10 ways for this question, but we are not interested in getting the same solution with more complex logic.
There ARE some equivalent solutions to what we have in Answer 1, and some with advantages for the extended question such as find Third (Nth) Highest Salary of Employee?
Answer 1-2,
SELECT MAX(salary) [Second Highest Salary]
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees)
This is exactly equivalent to Answer 1.
Answer 1-3,
SELECT MAX(salary) [Second Highest Salary]
FROM employees
WHERE salary NOT IN (SELECT DISTINCT TOP 1 salary
FROM employees
ORDER BY salary DESC)
The logic of this solution is similar to Answer 1, but SubQuery uses SELECT TOP Clause. The advantage for this is that if we replace 1 in SubQuery by 2, we can get the solution for finding the Third Highest Salary of Employee, such as
SELECT MAX(salary)
FROM employees
WHERE salary NOT IN (SELECT DISTINCT TOP 2 salary
FROM employees
ORDER BY salary DESC)
For which, if we try to use Answer 1 to get the solution, it is possible, but a bit complex,
SELECT MAX(salary) [Second Highest Salary]
FROM employees
WHERE salary NOT IN (
SELECT MAX(salary)
FROM employees
)
AND salary NOT IN (
SELECT MAX(salary)
FROM employees
WHERE salary NOT IN (
SELECT MAX(salary)
FROM employees
)
)
where,
- NOT IN the first SubQuery --- get rid of the top max value as Answer 1 did, and
- the second SubQuery is just the Answer 1 itself
- NOT IN the second SubQuery --- get rid of the second max value
Therefore, we got the third-highest salary, however, with quite complex logic --- this is what we do not like.
Furthermore, if we want the Nth Highest Salary of Employee, it will be hard by Answer 1 or Answer 1-2, but we can easy to get it by Answer 1-3 by replacing 1 by N-1, then we have,
SELECT MAX(salary) [Nth Highest Salary]
FROM employees
WHERE salary NOT IN (SELECT DISTINCT TOP (N-1) salary
FROM employees
ORDER BY salary DESC)
Note (12/20/2022 added),
we can get the Nth Highest Salary of Employee, by the way below, that is equivalent to the way above (think about the logic):
SELECT MIN(salary) [Nth Highest Salary]
FROM employees
WHERE salary IN (SELECT DISTINCT TOP N salary
FROM employees
ORDER BY salary DESC)
Answer 1-4,
SELECT DISTINCT salary [Second Highest Salary]
FROM employees e1
WHERE 2 = (SELECT COUNT(DISTINCT salary) FROM employees e2
WHERE e1.salary <= e2.salary);
This is equivalent to Answer 1-3, and also can be used to get Nth Highest Salary of Employee. However, it is a bit hard to understand its logic. Let us discuss it step by step,
First, the Inner Query,
SELECT COUNT(DISTINCT salary) FROM employees
Return - 5, that is the category of salary:
SELECT salary FROM employees
GROUP BY salary
Return
Then we add the condition into the Sub Query, and let it == 2,
2 = (SELECT COUNT(DISTINCT salary) FROM employees e2
WHERE e1.salary <= e2.salary)
Satisfy the condition for e1.salary from the Major (Outer) Query must be the 130000, the second-highest salary as required.
Answer for Question 2
Find and Remove Duplicate Rows from a Database Table?
Finding Duplicate Rows is relatively easy to do while locating the positions for the Duplicate Rows, and therefore Removing them is a bit difficult.
Finding Duplicate Rows
SELECT name, salary, managerid, COUNT(salary) Records
FROM employees
GROUP BY name, salary, managerid
HAVING COUNT(salary) > 1
The result,
We group all records except the identity or primary key that must be different for a different records.
Removing Duplicate Rows with an identity[ref]
We first need to locate the position of the duplicate records, which is easier when the table has an identity on it.
Use the following query to select the duplicate records,
SELECT *
FROM employees
WHERE Id NOT IN (SELECT MIN(Id)
FROM employees
GROUP BY name, salary, ManagerId)
Those are,
Then Delete them,
DELETE
FROM employees
WHERE Id NOT IN (SELECT MIN(Id)
FROM employees
GROUP BY name, salary, ManagerId)
Result
Records 9, 10 are deleted,
Removing Duplicate Rows withOUT an identity --- Oracle[ref]
For the Oracle database, it is easier because Oracle has a ROWID pseudo column for a table, then we can do the job as.
Locate the duplicate Records,
SELECT *
FROM employees
WHERE ROWID NOT IN (SELECT MIN(ROWID)
FROM employees
GROUP BY name, salary, ManagerId)
Delete the duplicate Records,
DELETE
FROM employees
WHERE ROWID NOT IN (SELECT MIN(ROWID)
FROM employees
GROUP BY name, salary, ManagerId)
Removing Duplicate Rows withOUT an identity --- SQL Server
We can use CTE (Common Table Expression) with ROW_NUMBER[ref] function.
Locate the duplicate Records[ref],
;WITH CTE AS
(
SELECT name, salary,managerid,
ROW_NUMBER() OVER (PARTITION BY name, salary, managerid
Order by name) AS Rnum
FROM employees
)
SELECT * From CTE
Result
Delete the duplicate Records: Rnum > 1,
;WITH CTE AS
(
SELECT name, salary,managerid,
ROW_NUMBER() OVER (PARTITION BY name, salary, managerid
Order by name) AS Rnum
FROM employees
)
--SELECT * From CTE
DELETE FROM CTE WHERE Rnum <> 1
We can also do the job without using CTE[ref],
DELETE DUP
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY name, salary, managerid
Order by name) AS Rnum
FROM employees
) DUP
WHERE DUP.Rnum > 1;
Summary
We choose two queries at this first time trying,
- Query to find Second Highest Salary of Employee?
- Find and Remove Duplicate Rows from a Database Table?
If I find some other interesting Query, I will add them to this article.
References
Concept