Overview
In this article, we are going to learn about the differences between Storage Account and Azure Data Lake Gen2 services.
1. Overview of Storage Account and When to Use It?
Azure Storage Account is a foundational service that provides scalable, durable, and secure storage for a wide range of data types. It offers services such as:
- Blob Storage: For unstructured data like images, videos, and documents.
- File Storage: Azure Files for file-sharing solutions.
- Table Storage: For NoSQL key-value storage.
- Queue Storage: This is for message queuing between application components.
When to Use a Storage Account?
- Data Sharing: The most common use case is to store Application Data and share it with various other components.
- Backup and Disaster Recovery: Store application backups or archives.
- Static Website Hosting: Ideal for hosting static HTML, CSS, and JavaScript files.
2. Overview of Azure Data Lake Storage (ADLS) Gen2 and When to Use It
Azure Data Lake Storage Gen2 builds on the capabilities of Azure Blob Storage but is specifically optimized for analytics workloads. Its key feature is the hierarchical namespace, which supports directory structures similar to how we organize (nested) Folders and files in our computers, allowing faster processing for big data operations.
When to Use ADLS Gen2
- Big Data Analytics: Store and process large-scale datasets.
- Machine Learning Pipelines: Manage and analyze structured and unstructured data.
- Data Lakehouse Architectures: Consolidate data for both analytics and operational systems.
- SFTP: ADLS Gen2 supports Secure File Transfer Protocol (SFTP), allowing seamless file uploads and downloads via standard SFTP clients.
3. Creating a Storage Account and ADLS Gen2
Creating a Storage Account and ADLS Gen2 in the Azure Portal. The process is the same for creating the Storage Account and ADLS Gen2, except for one property called Hierarchical Namespace.
Steps to Create a Storage Account
- Log in to the Azure Portal.
- Navigate to Storage Accounts and click Create.
- Fill in the required fields:
- Resource Group: Choose or create one.
- Storage Account Name: Enter a globally unique name.
- Primary Service: Azure Blob Storage or Azure Data Lake Gen2
- Region: Select the closest region.
- Performance/Replication: Choose the desired performance and replication strategy.
- Click Review + Create, then Create.
Steps to Create an ADLS Gen2 Account
- Follow the same steps as above but under Advanced settings:
- Enable Hierarchical Namespace as shown below.
4. Creating Folders in Storage Accounts and ADLS Gen2
- For ADLS Gen2
- Use the Azure Portal to navigate to the ADLS Gen2 account and create a Container.
- Inside the Container, create a Directory using the Add Directory button; once created, you can upload files.
- Fr Storage Account
- Create a container in the Blob service.
- Inside the container, use prefixes in blob names to simulate folder-like structures (e.g., folder1/file.txt). Doing so, it creates a folder named folder1 and uploads the file.txt inside the folder. However, folder1 is just a virtual folder and not a folder, which means that when you delete all the files inside the folder, the folder is also deleted.
What is ACL in ADLS Gen2?
Access Control Lists (ACLs) provide fine-grained access permissions to directories and files in ADLS Gen2. Unlike Role-Based Access Control (RBAC), which operates at the account level, ACLs allow you to define access for specific users or service principals or Microsoft Entra Groups at a more granular level.
Setting ACLs in ADLS Gen2
- Navigate to the ADLS Gen2 account in the portal.
- Go to the Containers section, select a directory or file, and click Access Control (IAM).
- Set Read, Write or Execute permissions for users/groups.
Summary
Azure Storage Accounts and ADLS Gen2 are powerful tools for managing data in the cloud. While Storage Accounts cater to general-purpose storage needs, ADLS Gen2 is designed for analytics-heavy workloads with its hierarchical namespace and fine-grained ACL capabilities. Selecting the right service depends on your workload requirements and data processing needs.