Write And Read Pandas Dataframe And CSV To And From Azure Storage Table

Introduction

This article is about how to read and write Pandas DataFrame and CSV to and from Azure Storage Tables. The Pandas DataFrames are used in many Data Analytics applications. Therefore, storing it in a cloud is a repetitive task in many cases. Here we can see how we can do the same. First, we see how to save data in CSV file to Azure Table Storage and then we'll see how to deal with the same situation with Pandas DataFrame.

Prerequisites

  • Azure Account : (If not you can get a free account with ₹13,300 worth of credits from here. If you are a student you can verify your student status to get it without entering credit card details else credit card details are mandatory).
  • Azure Storage Account: To know how to create a storage account follow the official documentation.

Saving a CSV file to the Azure Storage table

We need a CSV module and Azure cosmosdb table for the same so install the same, You may use pip install CSV and pip install azure-cosmosdb-table for the same.

Read CSV File

The below function uses a CSV module to read a CSV file at a specified location.

import csv

def readCSV(location):
    with open(location) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        tab = []
        for row in csv_reader:
            tab.append(row)
        return tab

Initialize Table Service

To initialize table service we need credentials of an active Azure Storage Account. If you don't know how to create and get the credentials, follow the official documentation. Replace the values of account_key and account_name.

from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity
table_service = TableService(account_name='xxxx', account_key='xxxxxxxx')

Create an Azure Storage Table

You may create tables via different methods by using Azure Web Portal, Azure CLI, etc,... Below is a way to create a table using Python script. Replace the table name with your preferred table name.

table_service.create_table(table_name)

Save the table to Azure

To save a table (list of lists) to Azure we need a Partition Key and a Row Key. We are just updating the same with incremental values since it does not matter here. Here is the snippet which does the same.

def setTable(tablename, table):
    table_service = TableService(account_name='xxxx', account_key='xxxx')
    index = 0
    for row in table:
        task = {'PartitionKey': "P" + str(index), 'RowKey': "R" + str(index + 1)}
        index = index + 1
        for ele in row:
            task["Row" + str(row.index(ele))] = ele
        table_service.insert_entity(tablename, task)
    return True

Final Code

import csv
from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity

def readCSV(location):
    with open(location) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        tab = []
        for row in csv_reader:
            tab.append(row)
        return tab

def setTable(tablename, table, table_service):
    index = 0
    for row in table:
        task = {'PartitionKey': "P"+str(index), 'RowKey': "R"+str(index+1)}
        index = index + 1
        for ele in row:
            task["Row"+str(row.index(ele))] = ele
        table_service.insert_entity(tablename, task)
    return True

table_service = TableService(account_name='xxxx', account_key='xxxx')
table_service.create_table(table-name)
tab = readCSV("<location>")
res = setTable(table-name, tab, table_service)

Retrieving data from Azure Storage Table and saving to CSV File
 

Retrieving Data from Azure Storage Table

The below code snippet helps you to retrieve the Table.

def getTab(tableName):
    table_service = TableService(account_name='xxx', account_key='xxxx')
    tasks = table_service.query_entities(tableName)
    tab = []
    newrow = []
    for row in tasks:
        for ele in row:
            newrow.append(row[ele])
        tab.append(newrow)
        newrow = []
    return tab

Saving the table as a CSV File

To save the table (list of lists) as a CSV file we can use the CSV module. The below snippet helps you to achieve the same.

def saveToCSV(tab, fileName):
    with open(fileName + "Output.csv", 'w+', newline='') as file:
        writer = csv.writer(file)
        writer.writerows(tab)

Final Code

import csv
from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity

def getTab(tableName):
    table_service = TableService(account_name='xxx', account_key='xxxx')
    tasks = table_service.query_entities(tableName)
    tab = []
    newrow = []
    for row in tasks:
        for ele in row:
            newrow.append(row[ele])
        tab.append(newrow)
        newrow = []
    return tab

def saveToCSV(tab, fileName):
    with open(fileName + "Output.csv", 'w+', newline='') as file:
        writer = csv.writer(file)
        writer.writerows(tab)

table_service = TableService(account_name='xxxx', account_key='xxxx')
table = getTab(table-name)
saveToCSV(tab, fileName)

Pandas DataFrame to and from Azure Storage Table

We know Pandas DataFrames can be converted to the table (list of lists) directly by df. values. to list (). We have already discussed how to store the list of lists to Azure Storage Table. A sample of the main function is given below.

import pandas as pd
from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity

l = [[1 , 2 , 3],[4,5,6] , [8 , 7 , 9]]
df = pd.DataFrame (l)
list_of_lists = df.values.tolist()

table_service = TableService(account_name='xxxx', account_key='xxxxx')
table_service.create_table('table-name') # creating table
res = setTable('table-name', list_of_lists) # inserting csv to cloud

Conclusion

Saving data to Azure Cloud from CSV files and pandas data frame is discussed in this article. This is one of many methods to achieve the same. You can also save the CSV file as such in an Azure Blob. I hope you found this article useful.