Problem Statement
As of 21st Sep 2023, pipelines within Synapse / Data factory do not have a timeout functionality, because of which there is no out-of-box functionality to auto cancel/alert / notify long-running pipelines within Synapse / ADF.
Is it possible to cancel running Pipelines within Synapse / Azure Data Factory?
Prerequisites
- Azure Data Factory / Synapse
Solution
To achieve this functionality, we would have to build our own custom logic as below
GitHUB Code
Where the activities highlighted in yellow are the additional custom logic that needs to be added within the existing pipeline.
Pipeline Parameters
where
- TimeOutInSec: The timeout value of the pipeline in Seconds
- QueueTimeInSec: Time Interval in Seconds to validate the pipeline execution time / Status of the pipeline
- SubscriptionID: SubscriptionID of the Subscription hosting the Azure Data Factory / Synapse
- ResourceGroupName: Name of Resource group hosting the Azure Data Factory / Synapse
Pipeline Variables
In the above flow, Wait1, Wait2 & Fail1 activities are the normal pipeline dataflow activities.
'Set Pipeline Status' Set variable activity is used to determine the status of the normal flow in case of no timeout scenario (whether the Pipeline is Success or Failure).
Value :
@if(empty(replace(concat(activity('Wait1').error?.message,'^|',
activity('Wait2')?.error?.message,'^|',activity('Fail1')?.error?.message),'^|','')),'Success','Failure')
The above value and the integration aspect of the activities in Sequential flow can be understood from the below blog :
Error Logging and the Art of Avoiding Redundant Activities in Azure Data Factory
2. In parallel to your actual flow, we would use Until Activity and IF Activity to validate whether the pipeline has exceeded the allocated timeout value and take necessary actions.
a) Until Activity
Expression :
@or(not(empty(variables('PipelineStatus'))),greaterOrEquals( variables('ExecutionTime'),pipeline().parameters.TimeoutInSec))
where the until activity would iterate till the time either the main flow within the pipeline is completed (success or Failure), OR the pipeline execution is still in progress and has exceeded the timeout allocated for its execution.
Within Until Activity,
where the
i) Wait3 activity is to wait for some time before proceeding to the next iteration.
ii) 'Set TempExecutionTime' Set variable activity to get the overall execution time till that instance
Value :
@add(variables('ExecutionTime'),pipeline().parameters.QueueTimeInSec)
iii) 'Set ExecutionTime' Set variable activity to override the ExecutionTime variable
Value :
@variables('TempExecutionTime')
3. The IF activity is to check whether the Until exit was due to normal pipeline completion (Success or failure) to avoid notification/Cancellation or due to timeout.
Expression :
@not(empty(variables('PipelineStatus')))
Validate whether the PipelineStatus variable is empty or not.
In case it is empty, it means the normal flow of the pipeline is still in progress and has not reached the Set variable activity stage, thereby exceeding the timeout allocated for the pipelines.
In our use case, to Cancel the pipeline run, we would use Web activity and trigger the pipeline REST API.
The setup and details to cancel the pipeline run are illustrated in the below article: Cancel Azure Data Factory Pipeline Runs via Synapse / Data Factory
One can leverage the Logic App and web activity combination to Notify in case of timeout aspect (by overriding the Pipeline cancellation Web activity within the Logic App trigger web activity).
Output
Scenario 1)
Normal Success within the allocated time (disable the failed activity within the example and update the expression in the 'Set Pipeline Status' activity).
Scenario 2)
Failure within the allocated time.
Scenario 3)
Pipeline execution exceeding Timeout