Database And Tables In Azure Databricks

Database and Tables in Azure Databricks

An Azure Databricks Database is a collection of tables.

An Azure Databricks Table is a collection of structured data. We can cache, filter, and perform any operations supported by Apache Spark DataFrames on Azure Databricks tables and query tables with Spark APIs and Spark SQL.

Types of Tables

There are two types of tables: Global and Local

Global Table

A global table is available across all clusters. Azure Databricks registers global tables to either the Azure Databricks Hive metastore or an external Hive metastore.

Local Table

A local table is not accessible from other clusters and is not registered in the Hive metastore. This is also known as a temporary view.

When we create a table using the UI, we cannot:

Upload a file if we are using a High Concurrency cluster. Instead, use the Databricks File System (DBFS) to load the data into Azure Databricks.

Update the table. Instead, create a table programmatically.

Create a table using the UI

  1. Click Data in the sidebar. The Databases and Tables folders will display.
  2. In the Databases folder, select a database.
  3. Above the Tables folder, click Create Table.
  4. Choose a data source and follow the steps in the corresponding section to configure the table

  • Drag files to the Files dropzone or click the dropzone to browse and choose files. After upload, a path displays for each file. The path will be something like /FileStore/tables/<filename>-<integer>.<file-type>. You can use this path in a notebook to read data.

  • Click Create Table with UI.

Create a Table Programmatically
 

Create a Global Table

To create a Global table in SQL

CREATE TABLE <table-name>

To create a Global table from a DataFrame in Python or Scala

dataFrame.write.saveAsTable(“<table-name>”)

To create a Local table from a DataFrame in Python or Scala

dataFrame.createOrReplaceTempView(“<table-name>”)

Access a Table

We can view table details and read, update, and delete a table.

We can view the table in UI as well as in Notebooks.

To Access/View the Table from the UI

  • Click Data Icon Data in the sidebar.
  • In the Databases folder, click a database.
  • In the Tables folder, click the table name.
  • In the Cluster drop-down, optionally select another cluster to render the table preview.

To Access/View the Table from Notebooks

Query the table using SQL

SELECT * FROM <Table>

Query the table using Spark SQL

df = spark.sql(“select * from table_name”)
display(table_name.select(“*”))

Query the table using Python

diamonds = spark.table(“table_name”)
display(table_name.select(“*”))

Update a Table

The table schema is immutable. However, we can update table data by changing the underlying files.

For example, for tables created from a storage directory, adding or removing files in that directory changes the table's contents.

After updating the files underlying a table, refresh the table using the SQL command: REFRESH TABLE <table-name>

Delete a table

We can delete the table from UI (By clicking the delete option) and through coding (drop table <table_name>)