Introduction
Azure Cosmos DB, a distributed and multi-model database service, offers powerful features to manage large-scale data efficiently. Bulk Operations and Transactional Batch Operations stand out for handling diverse data management needs.
Bulk operations are designed for high-throughput scenarios, enabling parallelism, updates, or deletions across multiple partition keys, making them ideal for large-scale migrations or data processing.
On the other hand, transactional batch operations ensure atomicity within a single partition, allowing multiple operations—such as create, update, or delete—to execute as a single unit. These features, accessible through the Azure Cosmos DB .NET SDK, which empower to optimize performance and maintain data consistency while leveraging modern application
Bulk Operations
Bulk operations are ideal for high-throughput, parallel processing of large datasets across multiple partition keys, making them perfect for data ingestion, migration, and updates. They offer scalability, performance, and flexibility to support massive data processing needs.
Key Features of Bulk Operations
- Operate across multiple partition keys.
- Optimize throughput by internally batching requests.
- Automatically manage retries in case of transient failures.
using Microsoft.Azure.Cosmos;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
public class BulkOperations
{
private readonly Container _container;
public BulkOperations(Container container)
{
_container = container;
}
public async Task ExecuteBulkInsertionAsync(IEnumerable<MyProduct> Products)
{
var tasks = new List<Task>();
foreach (var product in Products)
{
tasks.Add(_container.CreateItemAsync(product, new PartitionKey(product.PartitionKey)));
}
await Task.WhenAll(tasks);
Console.WriteLine("Bulk Operation Completed.");
}
}
public class MyProduct
{
public string Id { get; set; }
public string PartitionKey { get; set; }
public string ProductName { get; set; }
}
Transactional Batch Operations
Transactional batch operations allow multiple operations on items sharing the same partition key to be executed atomically. This ensures that either all operations in the batch succeed or none are applied, preserving data consistency.
Key Features of Transactional Batch Operations
- All operations within the batch share the same partition key.
- It supports creating, reading, updating, and deleting operations in a single batch.
- Adheres to ACID (Atomicity, Consistency, Isolation, Durability) properties of the database.
Reference: https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/transactional-batch?tabs=dotnet
using Microsoft.Azure.Cosmos;
using System;
using System.Threading.Tasks;
public class TransactionalBatchExample
{
private readonly Container _container;
public TransactionalBatchExample(Container container)
{
_container = container;
}
public async Task ExecuteTransactionalBatchAsync()
{
var partitionKey = new PartitionKey("PartitionKeyValue");
TransactionalBatch batch = _container.CreateTransactionalBatch(partitionKey)
.CreateItem(new MyProduct { Id = "1", PartitionKey = "partitionKeyValue", ProductName = "Value1" })
.UpsertItem(new MyProduct { Id = "2", PartitionKey = "partitionKeyValue", ProductName = "Value2" })
.DeleteItem("3");
TransactionalBatchResponse response = await batch.ExecuteAsync();
if (response.IsSuccessStatusCode)
{
Console.WriteLine("Transactional batch executed successfully.");
}
else
{
Console.WriteLine($"Transactional batch failed with status code: {response.StatusCode}");
}
}
}
public class MyProduct
{
public string Id { get; set; }
public string PartitionKey { get; set; }
public string ProductName { get; set; }
}
How does Transactional Batch Work?
When the ExecuteAsync method is called on a TransactionalBatch, all the operations defined in the batch (such as create, update, delete, or read) are grouped together and serialized into a single network payload. This payload includes metadata and the sequence of operations, ensuring that all the necessary information is packed into a single request to minimize network overhead.
Once the payload is sent to the Azure Cosmos DB service, the service processes the entire batch within a transactional scope. This means that all operations in the batch are treated as a single atomic unit. If any operation in the batch fails—such as a constraint violation or a partition key mismatch—the entire batch is rolled back, ensuring that no partial changes are committed. This transactional guarantee adheres to ACID properties (Atomicity, Consistency, Isolation, Durability) within the scope of a single partition key.
After processing the batch, the Azure Cosmos DB service returns a response to the client. This response is also serialized and includes the overall status of the batch operation—success or failure. If successful, the response contains the status of each individual operation within the batch, such as the newly created resource IDs, update statuses, or deletion confirmations. If the batch fails, the response includes error details to help identify the issue, but no operations in the batch are applied to the database.
This design ensures efficient communication by reducing the number of network round-trips and maintaining strong data consistency for all operations within the same partition key. It is particularly useful for scenarios where multiple related operations must be executed together, such as updating an order and its associated items in an e-commerce application.
Comparison Between Bulk and Transactional Batch Operations
Feature |
Bulk Operations |
Transactional Batch Operations |
Use Case |
High-throughput data ingestion, migration, or updates. |
Atomic multi-operation transactions on items in the same partition key. |
Partition Key Requirement |
Can span multiple partition keys. |
Must be within the same partition key. |
Transaction Guarantee |
No guarantee of atomicity; operations can partially succeed. |
Atomic operations; all succeed or fail as a unit. |
Scalability |
Optimized for large-scale, parallel operations. |
Limited to 2 MB or 100 operations per batch. |
ACID Compliance |
Not applicable. |
Fully ACID-compliant. |
Advantages of TransactionalBatch in Azure Cosmos DB
Here are the key benefits.
- Atomicity of operations: Ensures all operations succeed or fail as a unit, preserving data consistency.
- Improved performance and efficiency: Reduces network overhead by bundling operations into a single request.
- Consistency across operations: Guarantees data integrity within the same partition.
- Simplified error handling: Centralizes error handling for multiple operations in a single response.
- Ideal for related data operations: Particularly useful for scenarios involving related data (e.g., updating an order and its associated items).
- Scalability: Efficient handling of up to 100 operations within a single batch.
- Flexibility: Supports multiple operation types (create, update, delete, read) in a single batch.
Are there any known limits for TransactionalBatch?
Yes, there are certain known limits when using TransactionalBatch in Azure Cosmos DB. These limits are important to consider when designing and implementing your application to ensure that your batch operations work as expected without exceeding the service's constraints.
- Payload Size: As per the Azure Cosmos DB request size limit, the size of the TransactionalBatch payload cannot exceed 2MB, and the maximum execution time is 5 seconds.
- Number of Operations: There is a current limit of 100 operations per TransactionalBatch to make sure the performance is as expected and within SLAs.
- Single Partition Key: All operations within a TransactionalBatch must operate on items within the same partition key. This means that you cannot perform operations on items from different partitions in a single batch.
When to Use Which?
- Bulk Operations: When you need to process or migrate a large volume of data with high throughput and can tolerate partial success or retries.
- Transactional Batch Operations: When working within a single partition and need atomicity for multiple operations (e.g., ensuring consistent updates to related data).
Conclusion
Bulk operations are ideal for high-throughput, parallel processing of large datasets across multiple partition keys, making them perfect for data ingestion, migration, and updates.
On the other hand, Transactional Batch Operations provide atomicity for operations within a single partition, ensuring data consistency and integrity.
Both features, accessible through the Azure Cosmos DB .NET SDK, offer unique advantages, and understanding when to use each one is key to optimizing your data management strategies. Whether you're working with high-throughput data or need to maintain consistency across multiple operations, Azure Cosmos DB provides the tools you need to build efficient and reliable applications.