Explaning Joins in Azure Cosmos DB

Introduction

Azure Cosmos DB supports querying items as standard as SQL, using the most familiar and popular query languages, as a JSON query language with common operators and operations. Similarly, join operations allow us to combine data from multiple JSON objects within a single container. This is very important. We can query the data with a single container, specifically between arrays and objects within a single document (nested document). For example, if a document contains an array, you can join the array elements to the parent document or other properties of the document. So, it means the data should contain or be available in the same container. For this reason, we need to design/model our cosmos database very carefully.

Basic Join Structure in Cosmos DB

Here is the basic structure to apply the join within the document.

SELECT *
FROM ContainerName c

JOIN <AliasName> IN c.AttributeName

  • ContainerName is the original container which contains the documents
  • C is an alias for the documents in the specified container
  • JOIN is the keyword with alias name (<AliasName>), IN is again a keyword, and <c. AttributeName> creates a join between the document (c) and the elements in the array AttributeName.

Supported Types of Joins
 

Within a single item in the container

In this type of join, we can apply join in the same single document with an array. Here is an example of an employee table/container that contains the array of skills that are parts of an employee.

Employee Data in JSON

{ 
    "EmployeeID": "1", 
    "EmployeeName": "John W", 
    "Address": "Address", 
    "Skills": [ 
        { 
            "name": ".Net" 
        }, 
        { 
            "name": ".Net Core" 
        }, 
        { 
            "name": "MVC" 
        } 
    ] 
} 
{ 
    "EmployeeID": "2", 
    "EmployeeName": "George K", 
    "Address": "Address2", 
    "Skills": [ 
        { 
            "name": "C# .Net" 
        }, 
        { 
            "name": "MVC" 
        } 
    ] 
} 
{ 
    "EmployeeID": "3", 
    "EmployeeName": "William M", 
    "Address": "Address3", 
    "Skills": [ 
        { 
            "name": "MS Azure" 
        } 
    ] 
} 

Cosmos Query

SELECT Emp.EmployeeID, Emp.EmployeeName, skill.name  
FROM Emp 
JOIN skill IN Emp.Skills 

This query will include the employee data with the skills and return all the records of employees with their skills. We can also further apply the WHERE clause on the above query to filter out the data based on employee name or skill, here is an example.

SELECT Emp.EmployeeID, Emp.EmployeeName, skill.name  
FROM Emp 
JOIN skill IN Emp.Skills 
WHERE Emp.EmployeeName LIKE '%William%' 
SELECT Emp.EmployeeID, Emp.EmployeeName, skill.name  
FROM Emp 
JOIN skill IN Emp.Skills 
WHERE skill.name = '.Net' 

Within multiple items within a container

In this type of join, we can apply to join in the same single document with an array and complex nodes. This is an example of a nested JSON document. Joins syntax can be used in such cases as well.

Here is an example of an employee table/container which contains the array of complex nodes which are parts of an employee.

Employee Data Json

{ 
	"ID": "4", 
	"Name": "John W", 
	"Address": "JW Address", 
	"dependents": [ 
		{ 
			"firstname": "Allie", 
			"gender": "FeMale", 
			"qualification": [ 
				{ 
					"name": "BS" 
				}, 
				{ 
					"name": "MS" 
				} 
			] 
		}, 
		{ 
			"firstname": "Kevin", 
			"gender": "Male", 
			"qualification": [ 
				{ 
					"name": "O-Level" 
				} 
			] 
		} 
	] 
} 
{ 
	"ID": "5", 
	"Name": "George K", 
	"Address": "GK Address", 
	"dependents": [ 
		{ 
			"firstname": "Bata", 
			"gender": "Male", 
			"qualification": [ 
				{ 
					"name": "BSCS" 
				}, 
				{ 
					"name": "MSCS" 
				} 
			] 
		}, 
		{ 
			"firstname": "Serve", 
			"gender": "FeMale", 
			"qualification": [ 
				{ 
					"name": "A-Level" 
				} 
			] 
		} 
	] 
} 

Cosmos Query

SELECT Emp.ID 'Emp ID',  
Emp.Name 'Emp Name',  
depen.firstname 'Dependent Name',  
depen.gender 'Dependent Gender',  
qualif.name 'Dependent Qualification' 
FROM Emp 
JOIN depen IN Emp.dependents 
JOIN qualif IN depen.qualification 

This query will include the employee data with their dependent's information and return all the records of employees. In the above query, aliases are being used to differentiate the exact column value.

We can also further apply the WHERE clause on the above query to filter out the data based on any criteria.

Limitations of using Joins

  • Limited to Arrays within a single document, can't apply joins across the different documents.
  • Doesn’t support cross-container joins, unlike traditional SQL syntax.
  • Performance can degrade if the arrays being joined are large.

Conclusion

Joins in Azure Cosmos DB are a great way to query data within a single document or nested complex nodes using JSON query language, especially when working with arrays. Although they share some syntax similarities with SQL, they are designed for use within a document model and come with their own limitations. Always consider and monitor the performance implications and RU consumption when designing queries involving joins.