Exploring Aggregate Functions in Azure Cosmos DB

Introduction

The specific functions (aggregate) that perform calculations on a set of values and return a single meaningful value. We can also say that these are the mathematical functions that process multiple values to create a single summary statistic. These functions are used in many programming languages, spreadsheets, and relational algebra, common in SQL and now in cosmos databases.

Why do we need Aggregate Functions?

Aggregate functions allow us to summarize large datasets into easily understandable results, allowing us to quickly retrieve desired values from millions of records. They eliminate the need for complex queries or procedural code to generate summaries. These summarized values are crucial for business reporting, facilitating report generation that plays a key role in decision-making. Ultimately, aggregate functions are essential for efficiently analyzing and interpreting extensive datasets.

Dataset and JSON Format
 

ID First_Name Last_Name Phone Address Points
1 John Doe 555-123-4567 123 Main 1500
2 Jane Smith 555-987-6543 456 Elm 2300
3 Alice Johnson 555-543-2345 789 Oak 1100
4 Bob Brown 555-321-9876 321 Pine 1750
5 Sarah Davis 555-654-7890 654 Maple 2000
6 Michael Taylor 555-765-4321 987 Cedar 1850
7 Emily Martinez 555-246-1357 123 Birch 2400
8 David Lee 555-654-3210 741 Willow 900
9 Laura Wilson 555-987-1234 258 Cherry 2100
10 Chris Evans 555-321-6547 369 Ash 1750

JSON format file of customers is attached.

Supported Aggregate Functions in Cosmos DB

Cosmos DB offers a range of aggregate functions that enable calculations across multiple documents in a collection, allowing for efficient data summarization and analysis in a distributed database setting. The aggregate functions supported by Azure Cosmos DB are AVG, COUNT, Maximum, Minimum, and SUM. Here is the description of each function with query and result using the above dataset.

1. AVERAGE - AVG()

This function calculates the average of a given numeric values column from a set of documents and returns the average value. It operates in the same way as calculating a mathematical average. The syntax for calling this function is `AVG(Column_Name)`, which sums the values in a specified column and then divides the total by the number of rows in the dataset. Here is the example query and result using the above dataset.

Query

SELECT AVG(c.Points) 'Avg Points' 
FROM c 
WHERE c.entity_type = 'Customer' 

Result

[
	{
		"Avg Points": 1765
	}
]

2. COUNT - COUNT()

The Count function returns the number of items/documents that match a specific condition. This function is often used to count documents in a collection based on some filtering condition. Here is the example query and result using the above dataset.

Query

SELECT COUNT (1) 'Total Customers' 
FROM c 
WHERE c.entity_type = 'Customer' 

Result

[
	{
		"Total Customers": 10
	}
]

3. SUM - SUM()

This function is used to calculate the sum of all -non-null values from the set of items/documents. This can be used to calculate the total of numeric fields across multiple documents and only works with numeric fields. Here is the example query and result using the above dataset.

Query

SELECT SUM(c.Points) 'Total Points' 
FROM c 
WHERE c.entity_type = 'Customer' 

Result

[
	{
		"Total Points": 17650
	}
]

4. MAXIMUM - MAX()

This function returns the maximum value from a set of documents. This is used to find the highest value in a collection of documents for a numeric or comparable field. Here is the example query and result using the above dataset.

Query

SELECT MAX(c.Points) 'Max Point' 
FROM c 
WHERE c.entity_type = 'Customer' 

Result

[
	{
		"Max Point": 2400
	}
]

5. MINIMUM - MIN()

This function returns the minimum value from a set of documents. This is used to find the lowest value in a collection of documents for a numeric or comparable field. Here is the example query and result using the above dataset.

Query

SELECT MIN(c.Points) 'Min Point' 
FROM c 
WHERE c.entity_type = 'Customer' 

Result

[
	{
		"Min Point": 900
	}
]

Conclusion

The aggregate functions in Cosmos DB are highly effective for calculating unique values within a database or data in the container. We have discussed the importance of these functions and the ones supported by Cosmos DB, which perform similar roles to their counterparts in MS Excel on datasets. In this article, we explored various examples of using aggregate functions through Cosmos queries. We can apply filters as extensively as needed, based on the specific scenario or criteria with a query in the where clause.