Fabric Lakehouse to Snowflake: Seamless Data Integration Pipeline

Abiola David
1w
3.2k
0
3

Article

In this article, I am going to walk you through how to perform a scalable data integration from Microsoft Fabric Lakehouse to Snowflake Data Warehouse using Data Pipeline.

In the screenshot below, I've got data in the salesdatatocopytable in the Fabric Lakehouse. The data is of course, in the delta table.

Salesdatatocopytable

Create a Snowflake Account and Server Name

To create a free trial 30-day account, proceed to this link: https://www.snowflake.com/, follow the prompt to create an account, and specify the cloud host: Microsoft Azure, AWS, or Google Cloud Platform.

Google Cloud Platform

You will be required to provide a username and password during the signup. Also, you will be required to confirm your email and then. You will receive an email with details of your server.

Create Warehouse, Database, Schema, Table, and Insert Records

We want to perform data engineering to ingest the data into my Snowflake warehouse. Before we can achieve that, it is required that we have a warehouse, database, schema, and table created in the Snowflake account.

In the screen below, I have the following SQL script.

CREATE WAREHOUSE FabricWH;

CREATE DATABASE FabricDB;

USE FabricDB;

CREATE SCHEMA fabric_schema;

USE SCHEMA fabric_schema;

CREATE TABLE DataFromFabric (
    OrderDate STRING,
    Products VARCHAR(20),
    PaymentType VARCHAR(15),
    Units INT,
    Price INT,
    SalesAmount DECIMAL(10,2)
);

Databases

Proceed to run the SQL code. After successfully running the codes, from the screenshot below, the DataFromFabric table has been created without any records in the table.

Data From Fabric

We are going to head back to Fabric Lakehouse to initiate the process of ingesting the data.

Switch to Fabric Data Engineering Experience

In the screenshot below, at the bottom left of the screenshot, switch to the Data Engineering experience. Note the copy_to_s3 lakehouse is in the A to Z of the Warehouse workspace

Data Engineering

Warehouse

In the Data Engineering home page, as seen in the screenshot below, select Data Pipeline.

Data Pipeline

Provide a description of the pipeline. In this call, DataIngestionToSnowflake is provided.

Click on Create

DataIngestion

In the Data Pipeline Home, as seen below, we can start building our data pipeline.

Data Pipeline Home

Select Copy data.

In the Choose data source window, scroll down and select Lakehouse.

Lakehouse

Select Next,

In the next window, select the Lakehouse to copy data from. In this article, copy_to_s3 lakehouse is selected.

Lakehouse is selected

Click on Next,

In the Connect to data source, select the single table that matched the table created in the Snowflake database earlier. The salesdatatocopytos3 is selected.

Connect to data source

Click on Next,

In the Choose Data Destination, select Snowflake as seen below.

Data Destination

Click on Next,

In the next window, provide the Server by copying the server that you can get from the Welcome to Snowflake email you received!

Next, provide the Warehouse name created earlier. In this article, FabricWH is provided.

In the Connection credentials, you can optionally provide the connection name or proceed with what is generated automatically.

Next, scroll down and provide the username and password provided during the account registration on the Snowflake website.

Registration

Proceed by clicking on Next,

In the intermediate window, click on test connection to be certain connection is established to Snowflake Data Warehouse. In this article, the connection is successful as seen in the screenshot below. Select the target database created earlier. FabricDB database is selected.

FabricDB database

Click on Next,

In the next window, from the Table dropdown, select the target table. Fabric_Schema.DataFromFabric table is selected. We are also investigating the source and destination column data types and maps as required.

Fabric_Schema

Click on Next,

In Settings, enable staging is checked automatically, and the Datastore type is set to Workspace. This is fine.

Workspace

Click on Next,

Click on Next

In the Review + Save stage, click on Save + Run to execute the data transfer.

In the screenshot below, the data transfer using the pipeline was successful, with the activity status showing a green checkmark.

Green checkmark

To investigate the data, head to Snowflake and run a select * from DataFromFabric in the worksheet, as seen below.

Investigate the data

There we go! The data ingestion worked as expected. If you enjoy this data engineering tutorial, share the article with your connections, comment, and give it a thumbs up. See you in the next article.