Introduction
In this article, you will learn about Snowflake and its features.
What is Snowflake?
Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). It provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings. It is a cloud data warehouse that runs on cloud infrastructures like AWS, Azure, and GCP (Google Cloud Provider). It can not be run on a private cloud or hosted infrastructure. When you create an account in Snowflake, it asks whether you want to create your account on AWS, Azure, or GCP.
As it is a warehouse, we can create a database, and perform SQL commands like DDL/DML. Not only this, but we can also create SQL function, Views, and stored procedure (in JavaScript). It is not a relational database, so it does not enforce the primary key and foreign key relationship. But you can use it if you want. The purpose of Snowflake is that it provides the data faster. You can pull the millions of records with a minute and visualize the data by BI tools.
Snowflake's Features
Data Warehouse as a Cloud service
As this is a cloud-based data warehouse, there is no hardware and software to configure, install, and manage it. Snowflake handles the maintenance, management, and tuning from their end.
Scalability (Storage and Compute)
Snowflake implements a separation of storage and computation. A database is used to store the data in Snowflake and computation can be done by a virtual warehouse. Snowflake stores data into S3 bucket because of the following reasons,
- High availability
- Durability
- API to read data parts
User Experience (Performance)
- No Indexing
- No Performance Tuning
- No Partitioning
- No Physical Storage Design
In Snowflake, there is no indexing, no performance tuning, no portioning, no physical storage design. It simply works.
Supports Semi-structured data
Snowflake supports semi-structured data like Json, XML, Parquet, Avro, etc. You can load data as it is in the table.
Tuneable Pay per Use
You can configure it based on your requirement.
Please go through the Snowflake documentation on the Snowflake website for more details.
Here is the link.
It has very good documentation, where they explain each and everything in a descriptive manner. However, I am sure you will need the help of Snowflake documentation for implementing and exploring their functionality.
We have different tools and connectors to work with Snowflake.
Integration Support by Snowflake
Data Integration
You can have ETL (Extraction, Transformation, and Loading) process in Snowflake using Informatica, Talend, Alooma, Kafka, etc.
- Extraction: Extracting data from different source like csv, excel, Oracle etc
- Transformation: Transforming or modifying the extracted data as per need using look up and rules.
- Loading: Loading the transformed data into the destination source.
BI Tools
It enables analyzing, discovering, and reporting on the data using Power BI, Tableau, QuickSight, Qlik, Adobe, etc. A BI tool provides a facility the ability to visualize data through dashboards, charts, and other graphical outputs.
Machine Learning and Data Science
We can also refer to the advance analytical and AI (Artificial Intelligence) tool, Machine learning, Big Data tools for statistical and predictive modeling. BI tools can be focused on past data for reporting, but machine learning tools can be used to focus on large data set to discover a pattern and predict future trends. The tools are Databricks, DataRobot, Spark, etc.
Programmatic Interface
You can connect the snowflake using .Net, python, Node.Js, JDBC/ODBC driver. When you want to automate the DBA task in snowflake, it can be done by Python or other language connectors.
SQL Interface and Client
Snowflake has the following SQL interface:
- Snowflake Web Interface: For an ad-hoc query, you can use the Web Interface. You need to log in with the Snowflake account and execute the SQL query
- SnowSQL: You can perform a SQL query by installing the Snow SQL clients.
- DBeaver: You can connect to Snowflake using the DBeaver client as well.
Snowflake Architecture
It is a combination of shared disk and shares nothing database architecture. Snowflake architecture is divided into three layers.
- Service Layer
- Query Processing Layer
- Database Storage Layer
Cloud Services /Service Layer
This layer is responsible for authentication and access control management. It also manages the metadata information like if we create the warehouse then actually it creates the metadata only.
Query Processing
This layer is responsible for query processing. The SQL query can be processed using a Virtual warehouse, which is a parallel processing (MPP) compute cluster. You need to create the virtual warehouse as it is the same as EC2 instance of AWS, but instead of naming the virtual machine, it is called a virtual warehouse in Snowflake. Creating a warehouse is very easy in Snowflake, and it comes with different sizes like XS (extra small), S (small), M (medium), L (large), XL, XXL, XXXL. The SQL query will be processed using these virtual warehouses. By default, Warehouse will be suspended, but when we process query it will be activated. That is why Snowflake says you pay for storage and computation. If the warehouse was not in use, then you would pay only for storage.
Database Storage
Snowflake stores the data in the database. When we load the data in snowflake it automatically identifies the data for optimizing, compressing for internal purpose. The optimized data will be stored in cloud storage. Snowflake uses the AWS S3 bucket as cloud storage. As we know that S3 API can read the data parts based on the range. Whatever table we create inside the database, Snowflake splits that table into multiple partitions. Each partition called a micro partition, so it will be faster to access.
Snowflake Data Life cycle
To work with the data in Snowflake, Snowflake came up with the following mechanism is used to manage the data lifecycle.
- Organizing Data
- Storing Data
- Querying Data
- Working with Data
- Removing Data
Organizing Data
You can create Database, schema, and table to organize data in Snowflake
- Database and Scheme
There is a SQL command to create database and schema
Create database <database_name>
create schema <schema_name>
- Table and View : - Create table <table_name> (parameter_name with data type)
Storing Data
Storing data in a table can be done with the below SQL query:
- Insert: INSERT INTO TABLE ………
- Bulk Insert: Copy INTO TABLE query
Querying Data
To query the data, we use the ‘Select’ query.
Select * from table_name
Working with Data
Once the data is loaded into the table, then you can perform the following activities.
- Update, Merge, Delete
- Cloning Database, schema and table by Create <object> Clone….command
Removing Data
To remove the data from the table, you can perform the below command:
Almost all commands are the same as the SQL command in SQL server, or just a bit different. I will try to cover the main flow in my upcoming articles where all the above SQL command will be used.
Conclusion
I have covered the introduction of Snowflake with Features and architecture. You can use Snowflake if you are dealing with lots of data. You can visualize those data and perform machine learning on that to predict future trends. It will be faster in all aspects.
Happy Learning!