In the previous article, we saw how to create a Storage Account and Storage Access Keys. In this article, we will discuss on one of the low cost but useful offerings - “Blob Storage”. In this journey, we will first go through Blob storage, its different components, types of Blob storage, and naming rules, with an example. In the example, I will mostly focus on how to organize files in Blob storage.
Introduction
Blob storage is used for raw unstructured data. These can be a text file, an audio file, video file, or vhd etc. We can store any binary files. We can also use streaming in Blob storage. We can access Blob storage over HTTP or HTTPS (secure channel). We can store public facing data as well as secure data with SAS security.
Most common uses of Blob storage are as follows.
- We can store images like application logo or static images required in application and documents like ULAs, and templates which we can process.
- Storing files for distributed access.
- Streaming video and audio.
- Performing secure backup and disaster recovery.
- Storing data for analysis by an on-premises or Azure-hosted service.
Components
Blob storage requires the following components to organize the files/ bytes in Azure and we will discuss each component but before that, we will first look at the diagram and try to understand the concept.
- Storage Account
- Container
- Blob
In the above diagram, the first layer is a storage account which can have multiple containers (second layer). Each container, again, contains collection of blobs/files (last layer). This image gives us the idea about the hierarchy Blob storage. Let’s look into each component.
- Storage Account
We can access a Blob storage using a Storage account which is the first step and the top layer. A standard Storage account is required for Blob storage. You can refer to my previous article for more information about storage and creation of Storage account.
- Container
Container can have one or multiple blobs. It is like a bucket where the Azure stores all its files/blobs. We can also imagine it as a folder in Windows folder structure which contains files. So, we can say that these are the collection of blobs and we can have unlimited number of blobs in a container and each blob must be in a container.
- Blob
Blob is actually a file which can be of any type and any size. Blob can be categorized in three types: block blobs, page blobs, and append blobs.
In Block blobs, it is recommended to store text files like documents, words, txt etc. or any binary files like audio, video, setups etc. We cannot modify a file stored in a blob storage. Block blobs are divided into multiple blocks and each block has its block id. Size of a block blob can be varied up to a maximum of 100 MB.
Before 31-5-2016, we could store up to 195 GB, each 50,000 blocks of up to 4 MB in each block. Now, 4 MB request size is upgraded to 100 MB and the size of the blob is increased to 4.7 TB (50,000 blocks * 100 MB). One more interesting fact about blob storage is that we can store up to 256 MB in one go (with single write request) which priorly was 64 MB.
- Append blobs
By looking at its name, we can understand that this blob allows us to modify files in blob storage. Other than this feature, it is similar to Block blob. These blobs are very helpful in scenarios where we need to continuously edit a file. The size of Append blob can be upto 195 GB (4 MB * 50,000 blocks).
- Page blobs
Page blobs are efficient in frequent read and write operations. It is collection of 512 bytes of memory. At the time of initialization, we specify the max size of page blob. It sounds similar to hard disks where we choose a hard disk at the time of buying. The Max page size is 1 TB.
Note - Blobs in Azure storage emulator are limited to 2 GB.
Since Azure is evolving very fast, the above figures are true as of the date the article is published.
Naming and referencing containers and blobs
Blob storage follows RFC standards (RFC 2616, Section 2.2 and RFC 3987) while providing blob to a blob storage. Each resource here will have a unique URI. It becomes easy because every account name is unique and each account contains a container. Two storage accounts can have same container name but there cannot be two containers of the same name in an account. Blobs within a container must have a unique name. URl format for blob storage is as follows.
http://<storage-account-name>.blob.core.windows.net/<container-name>/<blob-name>
Container naming rule
Container must have a valid DNS Name and as per below protocols:
- Name must be in lowercase.
- It must start with a number or letter. Also, it can contain only letters or numbers.
- It can also contain dash [-] but it should be immediately preceded or followed by a letter or number.
- Minimum length for a container is 3 characters and maximum length is up to 63 characters long.
Blob naming rules
Blob name must follow the following rules:
- Blob names are case-sensitive and contain a combination of characters.
- Minimum length range of a blob can be 1 to 1024 characters and the number of path segments can be up to 254 (path segments are delimiters (/) which can be thought of as a virtual directory path).
- Reserved URL characters must be escaped.
Example
Let’s understand the naming rules with an example. Suppose, we are working on a video library project like YouTube. In these types of projects, raw data (video files) are very important. Thus, it is very important to store them in an organized way so that by looking at the URL, we can get some information about the file. Let’s see how we can categorize files in Azure storage.
URL in blob storage contains 3 components as we have discussed above. First component is storage account which, usually, has the name similar to application. So, our URl till now will be.
- http://youtube.blob.core.windows.net/
Suppose, in this application, we will store images and videos (it may contain other assets as well). Then, we need to create 2 containers.
- http://youtube.blob.core.windows.net/images
- http://youtube.blob.core.windows.net/videos
So far, we have categorized our assets on the basis of type of file. We can further categorize our assets by using delimiters in the blob name and form virtual directory path.
Suppose, we want to further categorize our video with movies, songs, series etc. We can easily do that by adding one more delimiter as below.
- http://youtube.blob.core.windows.net/videos/movies
- http://youtube.blob.core.windows.net/videos/songs
- http://youtube.blob.core.windows.net/videos/series
We can further categorize our movies on language basis (English, Hindi) as below.
- http://youtube.blob.core.windows.net/videos/movies/English
- http://youtube.blob.core.windows.net/videos/movies/Hindi
At last, we can add our file name to the URL, as shown below.
- http://youtube.blob.core.windows.net/videos/movies/English/StarWars.mp4
- http://youtube.blob.core.windows.net/videos/movies/Hindi/Don.mp4
In this way, we can add up to 254 delimiters but the entire URL length must be less than or equal to 1024. So, as per above rule, our blob name will be /movies/Hindi/Don.mp4 inside container videos under the service account named as YouTube. The URL is very easy to understand and provides basic metadata about the file.
References
- https://docs.microsoft.com/en-us/azure/storage/storage-dotnet-how-to-use-blobs
- https://docs.microsoft.com/en-us/azure/guidance/guidance-naming-conventions
- https://docs.microsoft.com/en-us/rest/api/storageservices/fileservices/blob-service-concepts