Introduction
In this blog, we will look into Azure Data Factory Triggers which is an important feature for scheduling the pipeline to run without manual intervention each time. Apart from the regular advantage to schedule the pipeline for future runs (which is very common), the azure data factory trigger has a special feature to pick and process data from past dates as well.
Types
Totally there are 3 types of triggers available in Azure Data Factory,
- Schedule triggers
- Tumbling window triggers
- Event triggers
Schedule Trigger
Schedule triggers are common triggers that can execute a pipeline on the time schedule we set. Schedule triggers offer more flexibility by giving many options to schedule like Minute, Hour, Day(s), Week(s), or Month(s). We can define the start and end date for when the trigger should be active, and it will run only from the moment it is created. The schedule can also be set to run on future calendar dates and times like every 15th and last Saturday of the month or every month's first and fourth Monday etc. The schedule triggers are designed to have ‘many to many relationships which basically means a schedule trigger can run many pipelines and a single pipeline can be run by many scheduled triggers.
Tumbling Window Trigger
Tumbling window triggers run at a specific time or on a periodic interval from a mentioned start time. It is very useful compared to scheduled triggers when you are dealing with historical data to copy or move. For instance, if you want the data to be copied from the SQL database, it will take the pipeline to run in the past and get that piece of data copied. It works by breaking the tumbling window trigger for every hour (if you have defined it for 1 hour) and pass on the start/end time for each time window into the SQL query, post which the data between the start/end time are returned to be saved into a destination of your choice. Dependency can be added to the second trigger action if you want it to be initiated only on successful completion of the first trigger. The advantage is the ‘Max Concurrency’ feature which will let us define the parallel executions to process historical data using the tumbling window schedules which at a given time can run the tumbling windows side by side without waiting for the previous one to get completed.
Event Triggers
Using event-based triggers we can schedule to run the pipelines in response to an event from azure blob storage. The most common use case is configuring to events like file arrival or deletion in blob storage. Azure data factory works on its integration with Azure event grid which works on similar lines but slightly different methodology. Event-based triggers work with not only blob but with ADLS too. Event triggers work on many to many relationships, a single trigger can start multiple pipelines and multiple triggers can start a single pipeline.
Summary
This is a brief introduction to the types of event triggers available in Azure Data Factory. If you want to get started with Azure Data Factory, please refer to my previous post which is detailed and practical scenarios.