Introduction
In the ever-expanding landscape of data sharing, Delta Sharing emerges as a beacon of collaboration, offering a secure platform within Databricks. Whether you're sharing data, notebooks, or AI models, Delta Sharing facilitates seamless communication beyond organizational boundaries.
Understanding Delta Sharing
Delta Sharing serves as an open protocol developed by Databricks, designed to facilitate secure data sharing across diverse computing platforms, irrespective of organizational affiliations. There are primarily three avenues through which data can be shared using Delta Sharing.
- Databricks-to-Databricks Sharing Protocol: This protocol enables sharing within Unity Catalog-enabled workspaces, supporting a plethora of features including notebook sharing, data governance, and AI model sharing.
- Databricks Open Sharing Protocol: With this protocol, tabular data managed within a Unity Catalog-enabled workspace can be shared with users on any computing platform, ensuring flexibility and accessibility.
- Customer-Managed Implementation: For those seeking versatility, Delta Sharing provides an open-source project allowing data sharing from any platform to any platform.
Key Concepts: Shares, Providers, and Recipients
Delta Sharing revolves around three fundamental concepts.
- Share: A collection of tables and assets designated by a provider for sharing with recipients. Shares can include tables, views, notebook files, and more, ensuring a comprehensive sharing experience.
- Provider: The entity responsible for sharing data with recipients. Providers manage shares and recipients, enabling seamless communication across platforms.
- Recipient: The entity receiving shared data from a provider. Recipients can access shared data assets based on the permissions granted by the provider.
Open Sharing vs. Databricks-to-Databricks Sharing
- Open Delta Sharing: Ideal for sharing data with users external to the Databricks ecosystem, irrespective of their computing platform. Recipients utilize tokens provided by the provider to access shared data securely.
- Databricks-to-Databricks Delta Sharing: Tailored for sharing data within Unity Catalog-enabled Databricks workspaces, ensuring enhanced security and streamlined access without the need for recipient tokens.
Setting Up Delta Sharing: A Provider's Perspective
Providers can kickstart their Delta Sharing journey through a series of steps.
- Enable Delta Sharing: Activate Delta Sharing for the Unity Catalog metastore managing the data to be shared.
- Create Shares: Define shares encompassing data assets within the Unity Catalog metastore.
- Establish Recipients: Create recipients to grant access to shares, ensuring seamless communication with designated entities.
- Grant Access: Provide recipients with the necessary credentials or sharing identifiers to access shared data.
Accessing Shared Data: A Recipient's Guide
Recipients can effortlessly access shared data assets, depending on the sharing model.
- Open Sharing: Recipients provide credentials to access shared data through various tools including Apache Spark, pandas, and Power BI.
- Databricks-to-Databricks Sharing: Recipients access shared data within Databricks workspaces, leveraging Unity Catalog for enhanced governance and simplicity.
Tracking and Governance
Providers and recipients alike can monitor sharing activities and access through robust auditing mechanisms offered by Azure Databricks, ensuring transparency and accountability throughout the data-sharing process.
Conclusion
Delta Sharing emerges as a transformative solution, fostering collaboration and innovation in the realm of data sharing. With its flexible protocols, robust security measures, and seamless integration within Azure Databricks, Delta Sharing paves the way for a new era of collaborative data exchange.