Introduction
In this article, we will learn the uploading process of the CSV and JSON files into a Snowflake stage using a SnowSQL client. Once the file is uploaded into the Stage, then we can perform a bulk copy operation to load the data from the file to a Snowflake table. Before working on the problem statement, we should have knowledge of SnowSQL and Snowflake Stage. SnowSQL is just a connector whereas a Snowflake stage is a location where we are loading our files. We can create an Internal and external stage in Snowflake. The external stage for AWS or Azure or GCP can be created in Snowflake. We should have the required keys respective to the cloud provider. In this case, if we have stored our CSV/JSON/XML or any semi-structured supported files in the S3 bucket or Azure Blob, we can simply stage it, and it will be available in Snowflake for data loading into the table.
That is why sometimes we say this process as ELT instead of ETL. In the ELT (Extraction, Loading, and transformation) process, first, we extract data from any source, then load it into a storage location to stage it on Snowflake. Once we have the list of files in Stage, we can transform those data as per our requirements. We are going to discuss the following things in detail
- SnowSQL
- Snowflake Stage
- Problem statement (uploading file to the stage and loading the data from the stage to the snowflake table)
What Is SnowSQL?
It is a command-line client that is used to connect Snowflake. Instead of Snowflake Web Interface, we can use the SnowSQL tool to perform/execute SQL query, DDL, and DML commands, including loading and unloading data. To install SnowSQL, please download the executable from the following link
Let us set up the SnowSQL in our system to work on our problem statement. Perform the following steps to install and configure.
Step 1. Download the executable from the given link and double-click once it is downloaded (refer to the below screen).
Step 2. Click on Next until you reach the below screen.
If you closely look at the screen, then you can see some instructions. We need to configure the account detail in ~/.snowsql/config file.
Step 3. Click on the Finish button.
To check whether SnowSQL is installed or not. Window key + R or Run command
Step 4. Open the Run command. You will get the below screen:
Step 5. Type %USERPROFILE%\.snowsql\ and then click on OK
Step 6. You can see the below screen. Now open the config file in Notepad or notepad++
Step 7. Add the Snowflake account information based on the below screen.
You need to provide an accountname, region, username, password, dbname, schemaname, warehousename, and rolename. This information you can get in your Snowflake web Interface.
Step 8. You can see [connections.my_example_connection] in the above screen. Here, my_example_connection we use to connect Snowflake in command prompt. You can give any name.
Step 10. Type ‘snowsql -v’, like shown below.
Step 11. Type snowsql -c my_example_connection and press the enter key
This command connects the snowsql with Snowflake as per the given configuration. In this case, it will come as ,nitesh#COMPUTE_WH@TEST_DB.MANAGEMENT refer below screen. Now you perform any SQL query, DML or DDL command.
Now SnowSQL is installed and configured in your system properly. Let us discuss the Snowflake Stage.
Snowflake Stage
It specifies the stored data files location, so we can load the data into the table. You can explore on the Snowflake documentation.
The following types of Stages can be created in Snowflake:
- User stage
- Table Stage
- Internal Named Stage
While uploading a file into the internal stage, we must specify the internal stage name in the PUT command, and while loading data into a table from staged files we must specify the stage name in COPY INTO <table> command
User Stages
By default, each user has a Snowflake stage allocated for storing files. It is a convenient option if your files will only be accessed by a single user but need to be copied into multiple tables.
Characteristics and Limitations
- It is referenced using @~; e.g. use LIST @~ to list the files in a user stage.
- It cannot be altered or dropped.
- It does not support setting file format options. Instead, you must specify the file format and copy options as part of the COPY INTO <table> command.
Table Stages
By default, each table has a Snowflake stage allocated for storing files. It is a convenient option if your files need to be accessible to multiple users and only need to be copied into a single table.
Characteristics and Limitations
- It has the same name as the table, e.g. a table named mytable has a stage referenced as @%mytable.
- It cannot be altered or dropped.
- It does not support setting file format options. Instead, you must specify the file format and copy options as part of the COPY INTO <table> command.
- It does not support transforming data while loading it (i.e. using a query as the source for the COPY command).
We do not go with this option if we need to copy the data in the files into multiple tables.
Internal Named Stages
It is named database objects which provides more flexibility for data loading. Because they are database objects, the security/access rules that apply to all objects apply
- Users who have appropriate privileges on the stage can load data into any table.
- Named internal stages are optional but recommended when you plan regular data loads that could involve multiple users and/or tables.
Creating Internal Named Stage
To achieve the solution for a given problem, we need to create the Internally named stage, so we can upload the files into that. Apart from creating a Stage in Snowflake, we can also create a stage for AWS, Azure, and GCP. We must provide the appropriate keys related to the cloud provider. Just an example, we create a stage with the AWS S# bucket key, then we can see all the listed files in the Snowflake stage which you have loaded to the S3 bucket. It is just created the reference, so now we can load the data into the table if the file is listed in our stage. In our scenario, we are creating a named stage in Snowflake, so uploading files into S3, we need to upload the file into the Snowflake stage. You can create a stage by GUI or SQL query. Ultimately, whatever we are passing to GUI, it creates SQL for that and runs it behind the scenes. I am creating Stage by GUI (Graphical User Interface).
Perform The Following Steps To Ceate Stage
Step 1. Login into the Snowflake account. You will be landing on the worksheet.
Step 2. Click on the Database from the Header (beside the Share Icon). Refer to the below screen.
Step 3.Click on the database where you want to create the stage. In this case. I have selected TEST_DB. Once you click it, you will be getting the below screen.
Step 4. Click on the Stages
Step 5. Click on Create. You will be getting the below screen. Now it depends on you whether you want to create stage for Snowflake or Amazon S3 or Azure or GCP.
Step 6. Select Snowflake Managed in the above screen and click on Next
Step 7. Enter the stage name and select the schema. Refer to the below screen.
Step 8. Click on Finish.
Step 9. In the below screen, you can see the stage which we have created under MANAGEMENT schema.
Now we have configured the SnowSQL and Created the STAGE as well. Let me give you brief information on File Format as it is required while loading data from stage to table.
File Format
It is a pattern of format that we create for the type of File. It can be CSV, JSON, XML, Avro, etc. It will help us while loading the different format data into the Snowflake table.
Here, I am creating the File Format for CSV and JSON file and we will use these formats while loading data from stage to snowflake table.
Creating File Format Inside The Database
Like a stage, we can create File Format inside the database. Perform the following steps:
Step 1. Click on the File Format from the menu
Step 2. Click on Create
Step 3. Specify the format name and schema name with the required settings. Refer to the below screen.
Step 4. Click on the Finish button.
Perform the above steps once again to create JSON file Format. Refer to the below screen.
Click on the Finish button.
Now we have everything ready to work on our problem statement.
Problem Statement
- Uploading CSV and JSON files from the local system to Snowflake Stage
- Load data from stage to Snowflake table
Solution
As we have already set up and configured the SnowSQL and Snowflake Stage, now it will be very easy for us to work on this solution part. I have created two files. One is employee_list.json and the other is employeedaa.csv. These files are available in C to drive inside the test data folder.
Please have a look at the screen below for JSON file data and CSV file data. I have also attached the testdata.zip here.
JSON File Data
CSV file Data in the below screen
Now, we must perform the following steps to achieve the solution of loading the file into stage and stage to Snowflake table.
Step 1. Open the command prompt.
Step 2. Type “snowsql -v “ and press enter key
It will provide you the snowsql version installed in your system.
Step 3.
Type snowsql -c my_example_connection and press enter key
This command connects snowsql with Snowflake as per the given configuration. In this case, it will come as Nitesh#COMPUTE_WH@TEST_DB.MANAGEMENT
Step 4. Type PUT file://c:\testdata\employeedata.csv @MYDATA; and press the enter key. Refer to the below screen.
Step 5. Once it is successfully loaded into Stage, you will see the below screen. The status will be uploaded, and it will be created employee.csv.gz compressed file on the Snowflake stage.
Step 6. Type “PUT file://c:\testdata\employee_list. json @MYDATA; “ and press enter key. Refer to the below screen.
The JSON file will be loaded into the Snowflake Stage called “MYDATA”
Step 7. Open your Snowflake Web Interface. Set the context
Use role sysadmin;
Use warehouse compute_wh;
Use database test_db;
Use schema management;
Step 8. Open the worksheet and type the below query
List @MYDATA;
It will show the list of the files available in the stage.
Step 9. To see the CSV file data, run the below query.
Select $1,$2,$3,$4,$5 from @MYDATA/employeedata.csv.gz;
It will show the CSV data in the result set. Refer to the below screen.
You can use the file format to see the proper data.
Select $1,$2,$3,$4,$5 from @MYDATA/employeedata.csv.gz (FILE_FORMAT=>’CSV_FORMAT’);
Refer to the below screen.
Step 10. Perform the bulk insert operation. We have the user_data table in TEST_DB database under the management schema.
copy into user_data (user_id,user_name,address)
from (select $1,$2,$3 from @MYDATA/employeedata.csv.gz)
FILE_FORMAT = 'CSV_FORMAT';
Refer to the below screen.
Step 11. The data has been loaded to the user_data table. To see the loaded data, run the below query
select * from user_data;
Refer to the below screen:
Step 12. We have loaded the CSV file data to the user_data table. Now we have to load the JSON file data to the table. Firstly, To view the JSON data in Stage, run the below query
select parse_json($1) from @mydata/employee_list.json.gz (file_format => 'JSON_FORMAT');
Refer to the below screen.
Step 13. Click on the result row so you can see the JSON format data. Refer to below screen.
Step 14. Create a table where we can store the JSON data as it is. Here, I am creating a table which is having one column whose data type is variant. This is the new data type in Snowflake which can store semi-structured data. Run the below query to create a table.
create or replace table test_table(user_data variant);
Step 15. Run the below query to load the JSON file data
copy into test_table (user_data)
from (select parse_json($1) from @mydata/employee_list.json.gz)
FILE_FORMAT = 'JSON_FORMAT';
Step 16. To view the loaded data into test_table, run the below query
select * from test_table;
Refer to the below screen.
Conclusion
Uploading files to a Snowflake stage can be done by any Snowflake connector client. We can post the file into the stage from the local system and then the data can be loaded from the stage to the Snowflake table. If you have an S3 bucket where you are posting/uploading the data files or if you have Azure blob where you are posting/uploading data files, then you can simply create a stage referencing the keys of the AWS S3 bucket or Azure blob and start loading the data into the table.
I answer questions that you may have on this functionality.
Happy Learning!