While learning and practicing Databricks, we may have our files in DBFS. But practically we will be fetching our source file/data from any of the OnPrem/Cloud storage.
If we have our data in Azure blob storage and that needs to get processed means, first we need to mount our data to DBFS.
DBFS stands for Databricks File System. DBFS provides for the cloud what the Hadoop File System (HDFS) provides for local.
The below steps can lead us to mount our Azure Blob Storage data to DBFS.
1. Create a Key vault and generate a secret to mount ADLS in databricks
In storage account Access Keys, copy any of the key1 or key2
2 . Go to Azure key Vaults -> Secrets -> Generate/Import
Paste the copied key from Access policies in the storage account and click Create
Give a name for your secret and paste the key that you copied
Secret has been created
3. Create a secrete scope in databricks
Go to https://<databricks-instance
>
#secrets/createScope
.
This URL is case sensitive, scope in createScope
must be uppercase
The properties(DNS Name & Resource ID) will be available in the Properties tab of an Azure Key Vault in the Azure portal.
After setting up the above things, Now we can create a new Databrciks Notebook and Mount our Blob to our DBFS using the below code.
dbutils.fs.mount(source = “wasbs://<container-name>@<storage-account-name>.blob.core.windows.net”,mount_point =“/mnt/<mount-name>”,extra_configs = {“<conf-key>”:dbutils.secrets.get(scope = “<scope-name>”, key = “<key-name>”)})
- <conf-key> can be either fs.azure.account.key.<storage-account-name>.blob.core.windows.net or fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net
- dbutils.secrets.get(scope = “<scope-name>”, key = “<key-name>”) gets the key that has been stored as a secret in a secret scope.