Introduction
Delta tables are building blocks of Microsoft Fabric. When we create a table in Fabric, they are defined as a delta table. Delta tables are Parquet files along with delta logs files, making them efficient and ensuring ACID transactions. This makes data management easier and more effective. In this tutorial, we will explore all the aspects of the Delta table that are applicable to Microsoft Fabric. Everything explained below is in the context of Microsoft Fabric.
- Delta tables: Each Delta table is divided into two parts, namely the Parquet file and delta logs.
- Parquet file: It is a column-oriented, open-source data format designed for faster operations. They are immutable. New files are written on every delete or update. They can be partitioned, and multiple files will be generated as a result.
- Delta logs: It contains all the transaction logs defined in folder '_delta_logs'. All the transactions stored in it are defined in JSON format. It makes ACID transactions possible.
- Types of delta tables: There are two types of delta tables. Irrespective of their type, their delta logs are maintained in Fabric.
- Internal: Tables created in Fabric are internal tables.
- External: Tables referenced to external sources are external tables.
For all the operations of create, read, update, and delete, we will use Spark SQL in the Fabric Notebook.
Creating delta table: Below is the command to create a delta table, and USING DELTA is optional.
%%sql
CREATE TABLE employee(
id INT NOT NULL,
name STRING NOT NULL
)
USING DELTA
Creating delta table with partitions: In this we need to ensure that the column selected for partition does not have very high cardinality. It must correctly divide the data.
%%sql
CREATE TABLE employee_two (
id INT NOT NULL,
name STRING NOT NULL,
department STRING NOT NULL
)
PARTITIONED BY (department);
Create external table
%%sql
CREATE TABLE employee_three
USING DELTA
LOCATION '<file_location>'
Insert Data
%%sql
INSERT INTO employee (id, name)
VALUES (1, 'Varun');
INSERT INTO employee_two (id, name, department)
VALUES (1, 'Varun', 'Software Delivery');
INSERT INTO employee_two (id, name, department)
VALUES (2, 'Varun', 'Software Innovation');
On execution on an employee two files are generated based on the partition key, below is a screenshot.
Select Data: On execution of select, we get a unified view.
We can create charts in the platform itself without needing any library.
Delete data: This command will delete both file and delta logs if external data in Fabric is deleted.
%%sql
DROP TABLE employee
View transaction logs
%%sql
DESCRIBE HISTORY employee_two
Optimize log files
For every update or delete command, new Parquet files are created and retained over time. For this, we execute the VACUUM command. Refer to this article for more details: https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-table-maintenance.
Thanks for reading till the end.
I hope this was helpful.
In the comments, mention how you feel about Microsoft Fabric.