Problem Statement
There are multiple properties associated with a file uploaded on Azure Blob Storage / Azure Data Lake Storage
One can leverage Get Metadata Activity within the pipelines to get only the below sub set of properties :
Is it possible to get other properties of the file like Creation Time, Content-Type etc. in Synapse / Data Factory pipelines.
Prerequisites
- Azure Data Factory / Synapse
- Azure Blob Storage / Azure Data Lake Storage
Solution
1. We would be leveraging Azure Blob Storage REST API : Get Blob to get the blob file properties.
2. Provide Synapse / Data Factory Storage Blob Data Reader access within the Azure Blob Storage to authenticate via Managed Identity.
a) Go to Access Control IAM of Azure Blob Storage and Click on Add & Select Add Role Assignment
b) Search Storage Blob Data Reader role and proceed further
3. Create a pipeline within Synapse / Data Factory leveraging Web Activity to trigger the REST API.
URL
In case of Azure Blob Storage
https://<<StorageAccountName>>.blob.core.windows.net/<<ContainerName>>/<<FileName>>
In case of Azure Data Lake Storage
https://<<DataLakeStorageName>>.dfs.core.windows.net/<<ContainerName>>/<<FileName/DirectoryName>>
Method: GET
Authentication: System Assigned Managed Identity
Resource: https://storage.azure.com/
Headers:
1 x-ms-version : 2017-11-09
Output
Get Metadata Activity output
Web Activity Output (Azure Blob Storage)
where [x-ms-creation-time] represents the file creation time.
Web Activity Output (Azure Data Lake Storage)
Directory Property
Web Activity