Problem Statement
One of the challenges in Data Engineering is the presence of Empty / Blank files or Files with just headers at the Source.
So is there any automated way to Skip / Notify processing of those files via Azure Data Factory / Synapse.
Prerequisites
- Azure Data Factory / Synapse
Solution
CASE 1. Empty / Blank File
As seen above, the File size for blank file is 0 B.
Resolution
We can leverage Get Meta Data activity to get the file size and in case if it is '0', we can conclude that it is an Empty / Blank file.
Output
CASE 2. File with just Header
Note: The size of file with only headers might change depending on the number of columns present in the header.
So one cannot rely solely on the file size to identify whether the file contains only header or not.
Resolution
We can leverage Lookup activity to get the count of records (excluding the header) within the file, and in case if it is '0', we can conclude that it is a file with just header (and no data).
Dataset :
With the first row as header enabled.
ADF flow
Lookup activity
IF Activity
Expression :
@equals(activity('File LookUp').output.count,0 )
In case the count of rows within the file is zero, throw error via Fail activity (in current example) or send Email notification or skip the iteration for that file.
Output
Note: The Lookup activity methodology can be used even for case #1. So it is a better and more efficient solution to validate both Empty/Blank files and/or files with only headers.