Problem
I have a database table with 15 columns. One of these columns is a Blob/Binary Column that we used to store the PDF binary data. Day by day, the storage and the size of the database grew, and now, the Oracle/SQL Server file size is close to 1.5 TB with 10 Million records. The size is growing 200GB each year with 1 million records.
Solution
I decided to move the Blob column (PDF File Stream) to Azure Blob Storage.
How
Fetch the blob column from the Oracle/SQL Server and upload record by record with GUID reference. Please follow the below steps to be more specific.
So, the concept here is, the blob/binary column is in byte[] format. When we upload a file to a blob storage, we need to convert and save the binary/file stream information with a unique name. For that, we will be using the additional column (GUID) as mentioned below. The GUID column will act as a Primary and Foreign key relationship between Oracle/SQL Server databases and the Blob Storage.
Results
After moving the Blob/Binary column to Azure Blob Storage, I deleted that column from the table. It saved us 1.4 TB space.
How did I do that?
Step 1
My focus is, how to upload, download, and delete file streams from a blob storage.
Finalize the database activity before moving to C#
Create a column Varchar (50) in your table. I created the 16th column and named it BlobStorageRefKey. Now, update the column with System_GUID that we are going to use to refer between Blob storage and Oracle/SQL Server table.
Here is the SQL query.
- Update TableName set BlobStorageRefKey = Sys_GUID();
C#.NET Project.
Step 2
Create a web application using Visual Studio.
Step 3
Create a connection or app setting for Azure Blob Storage connection.
Ex,
- <appSettings>
- <addkey="AzureConnection"value="DefaultEndpointsProtocol=https;AccountName= etc….. />
- </appSettings>
Step 4
We are going to read the blob column from database and upload with the BlobStorageRefKey column as reference to blob storage system.
FYI
When you have a large number of records in the database, if you fetch all records at a time, you may end off with “Out of Memory” issue.
I recommend fetching 5000 records at a time, processing them, and fetching the next 5000 and so on.
Statistics
Reading 5000 records (Blob Column) from the database and uploading it to the Azure Blob Storage took 1.8 minutes for me. I executed from the server like Azure Virtual Machine/client remote machines. If you do it from your local machine, it may take up to 10 minutes for every 5000 records.
First of all, how do you identify the first 5000 and next 5000…. When you have 1 million records, here is the solution.
Fetch the total number of records (count (BlobStorageRefKey)) from the table and pass a List< BlobStorageRefKey> as an input to the below method (Parameter: Locations) and use the below method to split the count for every 5000.
-
-
-
-
-
-
-
- publicstatic IEnumerable < List < T >> SplitList < T > (List < T > locations, int nSize = 5000) {
- for (var i = 0; i < locations.Count; i += nSize) {
- yieldreturn locations.GetRange(i, Math.Min(nSize, locations.Count - i));
- }
- }
The above method will give you the List of List count and your first index in your startingIndex and the list last index in your EndIndex so, use these 2 indexes to fetch the records from the main table using below query.
Ex
If you have 100K records in the database, the above methods result like List<List<ColumnName>> so, you will have 2 foreach() loops. The first foreach() loop will give you the first List<T> from List<List<T>> which 5000 and once you are done, your next List<T> will also have 5000 which are next available 5000 for you so and so forth...
The below method will help you to pull the Blob/Binary Column with BlobReferenceKey to a list of object/class.
-
-
-
-
- publicstatic List < ClassName > RetrieveBlobFromDatabase(string strConnection, int startIndex, int endIndex, ILog log) {
- try {
- log.Info("Database records fetching start Time: " + DateTime.Now);
- Console.WriteLine("Database records fetching start Time: " + DateTime.Now);
- var lstClassName = new List < ClassName which has Binary and String Properties > ();
- var con = new OracleConnection(strConnection);
- con.Open();
- var commandText = "SELECT PDFColumn, BlobReferenceKeyFROM Maintable WHERE ID between startIndex and EndIndex";
-
- var cmd = new OracleCommand {
- CommandText = commandText,
- Connection = con
- };
- var dr = cmd.ExecuteReader();
- if (dr.HasRows) {
- while (dr.Read()) {
- var objInfo = new ClassName {
- PDF = (byte[]) dr["PDFColumn"],
- BLOBSTORAGEKEY = dr["BlobReferenceKey"].ToString()
- };
- lstClassName.Add(objInfo);
- }
- }
- con.Close();
- log.Info("Database records fetching End Time: " + DateTime.Now);
- Console.WriteLine("Database records fetching End Time: " + DateTime.Now);
- return lstClassName;
- } catch (Exception e) {
- log.Error(e);
- throw;
- }
- }
The lstClassName contains your blob/binary column data and BLOBSTORAGEKEY is a GUID associated with that row. Now we are going to upload this binary information to blob storage in Azure.
UPLOAD
Step 5
The below method will help you upload your binary data to the Azure Blob Storage account.
Do foreach to the above lstClassName and pass first property to byte[] parameter and 2nd property to filename.
- foreach(var objResult in lstClassName) {
- UploadBinaryAsync(objResult.PDF, objResult.BLOBSTORAGEKEY).Wait();
- }
-
-
-
-
-
-
- privatestaticasync Task UploadBinaryAsync(byte[] bytes, string fileName) {
- try {
- var cloudStorageAccount = CloudStorageAccount.Parse(ConfigurationManager.AppSettings["AzureConnection"]);
-
- var cloudBlobClient = cloudStorageAccount.CreateCloudBlobClient();
-
- var cloudBlobContainer = cloudBlobClient.GetContainerReference("Your Container Name");
-
- var cloudBlockBlob = cloudBlobContainer.GetBlockBlobReference(fileName);
- cloudBlockBlob.Properties.ContentType = System.Net.Mime.MediaTypeNames.Application.Pdf;
- if (!cloudBlockBlob.Exists()) {
- using(Stream stream = new MemoryStream(bytes)) {
- await cloudBlockBlob.UploadFromStreamAsync(stream);
- }
- }
- } catch (Exception e) {
- Console.WriteLine("Exception on UploadBinaryAsync : " + fileName);
- log.Error(e);
- }
- }
Download a file from blob Storage
To download a blob from the Azure Blob Storage, you can use your Oracle/SQL Server table column blobReferenceKey as reference to download that from Azure.
-
-
-
-
-
- publicbyte[] DownloadFileFromBlob(string blobReferenceKey) {
- var storageAccount = CloudStorageAccount.Parse(ConfigurationManager.ConnectionStrings["blobConnection"].ToString());
- var blobClient = storageAccount.CreateCloudBlobClient();
-
- var container = blobClient.GetContainerReference("Your Container Name");
-
- var blockBlob = container.GetBlockBlobReference(blobReferenceKey);
-
- using(var ms = new MemoryStream()) {
- if (blockBlob.Exists()) {
- blockBlob.DownloadToStream(ms);
- }
- return ms.ToArray();
- }
- }
If you are using a MVC application class this method from FileResult looks like below.
- [HttpGet]
- public FileResult DownloadSds(string blobStorageReference) {
- var resultInfo = DownloadFileFromBlob(blobStorageReference);
- return File(resultInfo, System.Net.Mime.MediaTypeNames.Application.Pdf, fileName + ".pdf");
- }
Delete a Blob from Blob Storage
-
-
-
-
-
-
-
- privatestaticvoid DeleteBlob(string blobReferenceKey, ILog log) {
- try {
- var cloudStorageAccount = CloudStorageAccount.Parse(ConfigurationManager.AppSettings["AzureConnection"]);
- var cloudBlobClient = cloudStorageAccount.CreateCloudBlobClient();
- var cloudBlobContainer = cloudBlobClient.GetContainerReference("your container name");
- var blobReference = cloudBlobContainer.GetBlobReference(blobReferenceKey);
- if (blobReference.Exists()) {
- log.Info(blobRef);
- blobReference.DeleteIfExistsAsync();
- }
- } catch (Exception e) {
- Console.WriteLine(e);
- throw;
- }
- }
Finally, this is how it looks in the Azure Blob Storage.