2
Part - 2
2. b. Change Data Capture (CDC)
If your source database supports CDC (e.g., SQL Server with CDC enabled), you can use this feature to capture changes.
Steps:
-
Enable CDC: Ensure CDC is enabled on the source database.
-
Use CDC in ADF: Use the Copy activity or Data Flow to read the changes from the CDC tables.
-
Process Changes: Apply the changes to the target system.
Example:
1. Use the Copy activity to read changes from the CDC tables.
2. Apply the changes to the target database.

2.c. Incremental Load Using Data Flows
Data Flows in ADF provide a more visual and declarative way to implement incremental loads.
Steps:
-
Define Source and Sink: Set up the source and sink datasets.
-
Filter Data: Use the Filter transformation to filter data based on the watermark.
-
Join and Update: Use the Join transformation to join the source data with the target data and apply updates or inserts.
Example:
1. Define the source and sink datasets.
2. Use the Filter transformation to filter data based on the watermark.
3. Use the Join transformation to join the source data with the target data.
4. Apply updates or inserts to the target database.


2
Part -1
Some best practices and steps to achieve this:
Identify the Source and Target Systems
Understand the source and target systems, and identify the key columns that can be used to detect changes (e.g., timestamps, sequence numbers, or change data capture (CDC) columns).
Choose the Right Approach
There are several approaches to implement incremental loads in Azure Data Factory:
a. Watermark-based Incremental Load
This is the most common approach for incremental loads. It involves using a watermark column (usually a timestamp or an incrementing ID) to track the last processed data.
Steps:
-
Define Watermark Storage: Store the watermark value in a table or a file in Azure Blob Storage or Azure SQL Database.
-
Retrieve Watermark: Use a Lookup activity in ADF to retrieve the last watermark value.
-
Filter Data: Use a Copy activity or a Data Flow to filter the source data based on the watermark.
-
Update Watermark: After the data is loaded, update the watermark value to the latest processed value.
Example:
-
Source: SQL Server table with a LastModifiedDate
column.
-
Target: Azure SQL Database.
-
Watermark: LastModifiedDate
.
-
1. Retrieve the last watermark value from Azure SQL Database.
2. Filter the source data where LastModifiedDate > last_watermark.
3. Copy the filtered data to the target.
4. Update the watermark value in Azure SQL Database.


1
Implementing incremental data loads in Azure Data Factory (ADF) can be done in several ways, depending on your specific requirements and data sources. Here are some common methods:
-
Watermarking:
- Watermark Column: Use a column in your source database that tracks the last updated timestamp or an incrementing key. This allows you to load only the data that has changed since the last load.
-
Change Tracking:
- SQL Server Change Tracking: Utilize SQL Server’s Change Tracking feature to identify and load only the data that has been inserted, updated, or deleted.
-
LastModifiedDate:
- File-Based Incremental Load: For file-based data sources, use the LastModifiedDate attribute to filter and load only the new or modified files since the last load.
-
Dataflow:
- Data Transformation: Create a dataflow in ADF to handle data transformation and implement logic to filter data based on the last load timestamp.
-
Lookup and Stored Procedure Activities:
- Custom Logic: Use lookup activities to fetch the last load timestamp and stored procedures to handle the incremental load logic.
Each method has its own use cases and advantages. For example, watermarking is great for databases with a clear timestamp or incrementing key, while LastModifiedDate is ideal for file-based systems.
