Introduction
This article is about how to read and write Pandas DataFrame and CSV to and from Azure Storage Tables. The Pandas DataFrames are used in many Data Analytics applications. Therefore, storing it in a cloud is a repetitive task in many cases. Here we can see how we can do the same. First, we see how to save data in CSV file to Azure Table Storage and then we'll see how to deal with the same situation with Pandas DataFrame.
Prerequisites
- Azure Account : (If not you can get a free account with ₹13,300 worth of credits from here. If you are a student you can verify your student status to get it without entering credit card details else credit card details are mandatory).
- Azure Storage Account: To know how to create a storage account follow the official documentation.
Saving a CSV file to the Azure Storage table
We need a CSV module and Azure cosmosdb table for the same so install the same, You may use pip install CSV and pip install azure-cosmosdb-table for the same.
Read CSV File
The below function uses a CSV module to read a CSV file at a specified location.
import csv
def readCSV(location):
with open(location) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
tab = []
for row in csv_reader:
tab.append(row)
return tab
Initialize Table Service
To initialize table service we need credentials of an active Azure Storage Account. If you don't know how to create and get the credentials, follow the official documentation. Replace the values of account_key and account_name.
from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity
table_service = TableService(account_name='xxxx', account_key='xxxxxxxx')
Create an Azure Storage Table
You may create tables via different methods by using Azure Web Portal, Azure CLI, etc,... Below is a way to create a table using Python script. Replace the table name with your preferred table name.
table_service.create_table(table_name)
Save the table to Azure
To save a table (list of lists) to Azure we need a Partition Key and a Row Key. We are just updating the same with incremental values since it does not matter here. Here is the snippet which does the same.
def setTable(tablename, table):
table_service = TableService(account_name='xxxx', account_key='xxxx')
index = 0
for row in table:
task = {'PartitionKey': "P" + str(index), 'RowKey': "R" + str(index + 1)}
index = index + 1
for ele in row:
task["Row" + str(row.index(ele))] = ele
table_service.insert_entity(tablename, task)
return True
Final Code
import csv
from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity
def readCSV(location):
with open(location) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
tab = []
for row in csv_reader:
tab.append(row)
return tab
def setTable(tablename, table, table_service):
index = 0
for row in table:
task = {'PartitionKey': "P"+str(index), 'RowKey': "R"+str(index+1)}
index = index + 1
for ele in row:
task["Row"+str(row.index(ele))] = ele
table_service.insert_entity(tablename, task)
return True
table_service = TableService(account_name='xxxx', account_key='xxxx')
table_service.create_table(table-name)
tab = readCSV("<location>")
res = setTable(table-name, tab, table_service)
Retrieving data from Azure Storage Table and saving to CSV File
Retrieving Data from Azure Storage Table
The below code snippet helps you to retrieve the Table.
def getTab(tableName):
table_service = TableService(account_name='xxx', account_key='xxxx')
tasks = table_service.query_entities(tableName)
tab = []
newrow = []
for row in tasks:
for ele in row:
newrow.append(row[ele])
tab.append(newrow)
newrow = []
return tab
Saving the table as a CSV File
To save the table (list of lists) as a CSV file we can use the CSV module. The below snippet helps you to achieve the same.
def saveToCSV(tab, fileName):
with open(fileName + "Output.csv", 'w+', newline='') as file:
writer = csv.writer(file)
writer.writerows(tab)
Final Code
import csv
from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity
def getTab(tableName):
table_service = TableService(account_name='xxx', account_key='xxxx')
tasks = table_service.query_entities(tableName)
tab = []
newrow = []
for row in tasks:
for ele in row:
newrow.append(row[ele])
tab.append(newrow)
newrow = []
return tab
def saveToCSV(tab, fileName):
with open(fileName + "Output.csv", 'w+', newline='') as file:
writer = csv.writer(file)
writer.writerows(tab)
table_service = TableService(account_name='xxxx', account_key='xxxx')
table = getTab(table-name)
saveToCSV(tab, fileName)
Pandas DataFrame to and from Azure Storage Table
We know Pandas DataFrames can be converted to the table (list of lists) directly by df. values. to list (). We have already discussed how to store the list of lists to Azure Storage Table. A sample of the main function is given below.
import pandas as pd
from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity
l = [[1 , 2 , 3],[4,5,6] , [8 , 7 , 9]]
df = pd.DataFrame (l)
list_of_lists = df.values.tolist()
table_service = TableService(account_name='xxxx', account_key='xxxxx')
table_service.create_table('table-name') # creating table
res = setTable('table-name', list_of_lists) # inserting csv to cloud
Conclusion
Saving data to Azure Cloud from CSV files and pandas data frame is discussed in this article. This is one of many methods to achieve the same. You can also save the CSV file as such in an Azure Blob. I hope you found this article useful.