Query CSV File Saved In ADLS Through SQL Query - Azure Synapse Analytics

We are all aware that SQL is commonly used to query structured data but in Synapse Analytics we can use SQL to query unstructured data saved in files like CSV, parquet, etc using OPENROWSET function and it is one of the many features that can be done using synapse analytics. In this week’s article, we will look at how it can be done in a few steps.

Open the Synapse studio and click the ‘Data’ tab available on the left corner as shown below. This is where you can see all your linked data lakes and containers. In my case I have a container already created to my linked ADLS storage hence we will be using that. Currently, there are no files available in that so we will upload a CSV file from my local storage for this demo.

After loading your sample file to the ADLS storage which in-turn connected to your workspace, move on to the ‘Develop’ tab on the left corner below the data tab that we worked on within the previous section. Now you can click the ‘add’ symbol and create a SQL Script since our task is to query the files in ADLS through SQL query.

Now it’s all set, we can go ahead and start using SQL. Like our conventional SQL, you can query with an additional keyword call OPENROWSET which will help to fetch details from file and display in a tabular format.

SELECT * FROM OPENROWSET(
 BULK 'https://synapeadls11.dfs.core.windows.net/adl2fssyn/SampleCSVFile_50kRows.csv',
 FORMAT = 'CSV',
 HEADER_ROW = TRUE,
 PARSER_VERSION = '2.0'
)
AS RESULT

The ‘BULK’ field is where you must provide the file path for the file you have stored in ADLS. Use the below method to find that out from your attached storage.

I have queried the entire table with 50 thousand rows and since it was first time execution, it took a while. You can also limit it to few hundred rows by modifying the script to suit your requirement and it is simple as you do for any other SQL query.

Query CSV file saved in ADLS through SQL query - Azure Synapse Analytics

But if you are too lazy to type in the script or are GUI savvy? Microsoft azure has option for you too! Just go to the ADLS storage and right click where you will get options to select only top 100 rows. Not only that you can create the file straight into an external table, new spark table or even a spark dataframe.

One of the best parts of synapse studio is that you have option to access the in-built visualization feature available in the results pane through which different chart types with different columns can be created. This will be particularly useful and time saving when you need to create a chart and don’t want to save the file and move data to a separate application just for this purpose.

Query CSV file saved in ADLS through SQL query - Azure Synapse Analytics

Summary

We saw how to query unstructured file saved in data lake through SQL in azure synapse analytics. There is an important concept when working with data in azure synapse analytics to select between Serverless and Dedicated SQL Pools based on the requirement. Though my previous article explain it theoretically, I will write about it in upcoming article with a real time scenario.