Azure Databricks Lakehouse Platform Overview

Introduction

In this article, we are going to explore the origin of the lakehouse platform and the challenges with data lake implementation. Then, we will explore Azure Databricks Platform components and their use cases.

Azure Databricks

  • Software-As-A-Service Data and AI company
  • Simplifies big data and AI easier for enterprise organizations
  • Enable data-driven innovation
  • Brings together all your data, analytics, and AI into one Lakehouse platform
  • Power By Open source Delta Lake
  • Combines the best of warehouses and data lakes

Features

  • Ingest, transform massive quantities and types of data
  • Explore data using data science techniques including machine learning
  • Guarantee data availability for business queries
  • Provide simplified experience for data engineers, data scientists, and data analysts to do their work
  • Overcome traditional challenges associated with data science and machine learning workflows

Databricks workspace UI

Challenges with Data Lake Implementation

  • Complex working with Big Data
  • Decreased organizational efficiency due to silos
  • Data Security

Azure Databricks Platform Components

Azure Databricks is a data & ai, software as a service open-source collaborative tool. It basically provides three different types of environments :

  • Data Science & Data Engineering
  • Databricks SQL
  • Databricks Machine Learning

There are multiple ways to create databricks in Azure.

Azure Portal

Powershell

New-AzDatabricksWorkspace -Name {databricksname}-ResourceGroupName {resourcegroupname} -Location {region} -ManagedResourceGroupName {createneworexisting}
Here, {databricksname} = Databricks Workspace Name
{resourcegroupname} = Azure Resource Group Name
{region} = Azure Region Name
{createneworexisting}= Specify if you want to create a new managed resource group or create exiting one

ARM Template

Azure VNET

You can create Azure Databricks in a Custom bubble network or existing custom VNET,

  • Workspace
    • Notebook
    • Dashboard
    • Library
    • Repo
    • Experiment
  • Interface
  • Data Management
    • DBFS
    • Database
    • Meta store
    • Table
  • Computation Management
    • Cluster
    • Pool
    • Databricks Runtime
    • Jobs
  • Machine Learning
    • Experiment
    • Feature Store
    • Model
  • Data Management
    • User
    • Group
    • ACL

Azure Databricks Use Cases

  • Accelerate Machine Learning on Complex data & ai solutions
  • Faster Modern warehousing capabilities
  • Boost productivity with collaboration workspace & languages familiarity
  • Easy integration with Microsoft Stack
  • Suitable to smaller jobs

Conclusion

In this article, we explored the fundamentals of the Azure Databricks Lakehouse platform which can be used to create data and AI solutions.

Next Recommended Reading Azure Event Hubs: An Overview