As we already know, when we implement Normalization, we actually split our main table into sub-tables and remove the redundancy and inconsistency of data. Now, when we make multiple tables and when we need to extract the data from those multiple tables, obviously we need to combine them. This is the place where we need joins. This example is just for the demonstration of the concept where we need joins but it doesn’t mean that we can just implement the joins if we have applied the normalization on the database. The concept is - whenever we need to combine multiple tables, we use joins.
What is a Join?
As we can guess from the name, Join is the SQL statement with the help of which we can combine different tables.
Obviously, joins can only work when there is at least one common attribute in 2 tables. Actually, when we define the relation of 2 tables and define the foreign key in the dependent table which contains the primary key value of the parent table record, this is when both the tables are attached with each other.
Now, we have the relation between 2 tables. And now, we have various ways to combine the tables and extract the data.
- (Inner) Join
- Left (Outer) Join
- Right (Outer) Join
- Full (Outer) Join
Inner Join
It is the most frequently used join. It is also called Equi Join. Inner Join is the type of join which returns the matching values in both tables.
- SELECT InvoiceID
- , Customer.CustomerID
- , FirstName
- , LastName
- , Convert(nvarchar(10), InvoiceDate, 1) AS InvoiceDate
- , Total
- FROM Customer
- Join Invoice
- ON Customer.CustomerId = Invoice.CustomerId
Now, you might have seen the keyword INNER Join; this is also the same thing.
- SELECT InvoiceID
- , Customer.CustomerID
- , FirstName
- , LastName
- , Convert(nvarchar(10), InvoiceDate, 1) AS InvoiceDate
- , Total
- FROM Customer
- INNER Join Invoice
- ON Customer.CustomerId = Invoice.CustomerId
So, we use Inner join to get the matched value records in both tables. Now, let’s suppose we want to join 3 tables.
- SELECT Invoice.InvoiceID
- , Customer.CustomerID
- , FirstName
- , LastName
- , Convert(nvarchar(10), InvoiceDate, 1) AS InvoiceDate
- , Total
- , TrackId
- , UnitPrice
- , Quantity
- FROM ((Customer
- INNER Join Invoice ON Customer.CustomerId = Invoice.CustomerId)
- INNER Join InvoiceLine ON Invoice.InvoiceId = InvoiceLine.InvoiceId)
If you open the database schema, you’ll see we have the attributes with the same names in multiple tables. And if we want to use them in the joins where we’re combining multiple tables, then obviously we need to explicitly specify the complex attribute of which table it belongs to. And here, InvoiceId is also present in InvoiceLine and Invoice tables and both will have the same values whether we use
- InvoiceLine.InvoiceId
- Invoice.InvoiceId
Because it totally depends on check ON condition.
Correlation Table Names
It allows you to assign a temporary and shorter name of the table. Sometimes, they are called table alias.
- SELECT IL.InvoiceID
- , C.CustomerID
- , FirstName
- , LastName
- , Convert(nvarchar(10), InvoiceDate, 1) AS InvoiceDate
- , Total
- , TrackId
- , UnitPrice
- , Quantity
- FROM ((Customer AS C
- INNER Join Invoice AS I ON C.CustomerId = I.CustomerId)
- INNER Join InvoiceLine AS IL ON I.InvoiceId = IL.InvoiceId)
Look, we used AS keyword for the table alias.
Fully Qualified Object Names
In Fully Qualified Object Names, we specify the complete names.
Server.Database.Schema.Object
Actually when we’re combining the tables of multiple servers then we use this approach. It makes the better understandability and makes everything clear both to SQL engine and human as well.
- SELECT CustomerID
- , FirstName
- , LastName
- FROM [DESKTOP-2FA7R8U].Chinook.dbo.Customer
As you can see we are using the brackets around the server name because we are using the dash into the server name. The word dbo stands for Database Owner.
Left Outer Join
You might be confused about Left join by watching this diagram.
You might expect that we’ll get all the columns of table 1 and matching columns of table 2 with table 1. But it is not the case actually. With the help of left join, we’ll get all the records (rows) from the left table and all the matching records from table 2.
So don’t expect when you left join 2 tables, you’ll get all the columns from first table.
- SELECT c1.FirstName + ' ' + c1.LastName AS CustomerName
- , c2.FirstName + ' ' + c2.LastName AS EmployeeName
- FROM Customer c1
- LEFT JOIN Employee c2
- ON c1.SupportRepId = c2.EmployeeId
Right Join
It returns all the records from the right table and all the matching records from left table.
And the query is very similar to Left Join.
- SELECT c1.FirstName + ' ' + c1.LastName AS CustomerName
- , c2.FirstName + ' ' + c2.LastName AS EmployeeName
- FROM Customer c1
- Right JOIN Employee c2
- ON c1.SupportRepId = c2.EmployeeId
We just need to put Right instead of Left.
When you’re working with left or right joins, you’ll observe you have null values in some of your table attribute cells. Because we’re getting here 1 complete table records whether the values are matching or not.
Full Outer Join
Full Outer Join returns all the records of the tables when the values are matched on all the sides. In other words, it is basically the sum of left join and right join.
Full Join = Left Join + Right Join
- SELECT c1.FirstName + ' ' + c1.LastName AS CustomerName
- , c2.FirstName + ' ' + c2.LastName AS EmployeeName
- FROM Customer c1
- Full Outer JOIN Employee c2
- ON c1.SupportRepId = c2.EmployeeId
Self Join
In Self Join we join the single table with itself on the basis of some value. It is used to retrieve the records when we have some kind of relation or similarity with other records in the table. And when we’re working with self joins, we need to strictly use the alias because on the left-hand side and on the right-hand side, we have the same table. So, we use the alias to differentiate the query,
- SELECT e1.EmployeeId
- , e1.FirstName
- , e1.LastName
- , e1.Title
- , 'Reports To ' + e2.FirstName + ' ' + e2.LastName AS Manager
- FROM Employee e1
- JOIN Employee e2
- ON e1.ReportsTo = e2.EmployeeId
When we’re working with Self Joins obviously we should have the same kind of two attributes in the table. So here we have EmployeeId and ReportsTo are of the same type and containing the same kind of data inside. And now, we get this result,
But now, we want to show all the employees whether they can report to someone or not. So obviously we’ll use Left join to show all the records of employees.
- SELECT e1.EmployeeId
- , e1.FirstName
- , e1.LastName
- , e1.Title
- , 'Reports To ' + e2.FirstName + ' ' + e2.LastName AS Manager
- FROM Employee e1
- LEFT JOIN Employee e2
- ON e1.ReportsTo = e2.EmployeeId
And, we get this result.
Which Customer Lives in the Same City?
- SELECT c1.FirstName + ' ' + c1.LastName AS CustomerName
- , c1.City
- , c1.[State]
- FROM Customer c1
- JOIN Customer c2
- ON c1.[State] = c2.[State]
- AND c1.City = c2.City
- AND c1.CustomerId <> c2.CustomerId
- ORDER BY c1.[State], c1.City
Look the logic is very simple, we’re using self-join between 2 tables and applying the condition if both cities and states are the same but the customerId would be different. Then, we can find the customers in the table who belongs to the same city.
CROSS Join
Cross Join produces the Cartesian product of the tables. The size of the Cartesian product is,
Cartesian product = total number of rows in the first table * total number of rows in the second table
- SELECT * FROM Employee
- CROSS JOIN Invoice
We can write the same query as,
- SELECT * FROM Employee, Invoice
Now you might be wondering how it shows the result of the Cartesian product of 2 tables. If you know how to conclude the Cartesian product of 2 sets, then you have an idea of how we get the Cartesian product of 2 tables.
Union
Union statement is used to combine the results of two or more tables. When we’re working with the 2 sets then obviously if both sets contain any repetitive element then we only use it once in the result set. Similarly, we’re computing the Union in SQL. It just puts the repetitive element once in the resulting table.
- SELECT FirstName
- , LastName
- , Email
- FROM Customer
- UNION
- SELECT FirstName
- , LastName
- , Email
- FROM Employee
Now, you might get confused between join and union. See this link.
Union All
And if we want to allow the duplicate values as well, then we’ll use UNION ALL.
- SELECT FirstName
- , LastName
- , Email
- FROM Customer
- UNION ALL
- SELECT FirstName
- , LastName
- , Email
- FROM Employee
Intersect
If we want to get the same elements in both tables. Then we’ll use Intersect,
- SELECT FirstName
- , LastName
- , Email
- FROM Customer
- INTERSECT
- SELECT FirstName
- , LastName
- , Email
- FROM Employee
EXCEPT
Except excludes rows that exists in both result sets. Suppose we have two tables Table A, and Table B. Table B has some values which are also present in Table A. So,
- Table A EXCEPT Table B = Table A (excludes matching values)
- SELECT FirstName
- , LastName
- FROM Customer
- EXCEPT
- SELECT FirstName
- , LastName
- FROM Employee
And if we want to display them with any order then we’ll use Order By. It will apply on the complete result set.
- SELECT FirstName
- , LastName
- FROM Customer
- EXCEPT
- SELECT FirstName
- , LastName
- FROM Employee
- ORDER BY LastName