Extracting Data From Large Spreadsheets Using Python And Openpyxl

Introduction 

 
At least once in life, every computer user has to handle some spreadsheets or Excel files - whether it is to put formulas in for accounting, or to store data from Google forms or other surveys and etc. So what do you actually do when the spreadsheet contains just survey data and no number to apply formulas at all? I know you'll get your hands dirty and do it manually. It's fine if there's data from roughly a hundred people.
 
But, what will you do if there are data from a thousand sources or more? I'd have run away in such cases because I'm way too lazy to scroll down and have a look at those. Thankfully, I know how to do magic tricks using Python and I'm going to show you this specific magic trick today.
 
Let's make a problem first
 
I have a spreadsheet in my hand that has the Name, Address, and E-mail of 10,000 (Ten thousand, read that aloud) people. What I need to do is to use this information and create .txt files for each of them using their names as the file name that'll contain their Name, Address, and E-mail. More like a business card.
 
[ I generated the data in the file using Faker module, all data is fake ].
 
A screenshot of a portion of the file for your reference.
 
python
 
Imgur
 
What we need
  • Python3
  • openpyxl - Install it using the following command in your command prompt/shell:
pip install openpyxl
  • A text editor of your choice: Atom, VS Code, Sublime, Emacs, Vim whatever you like.
So, let's get to code
 
Our spreadsheet file name is - TestBook.xlsx
 
  1. import openpyxl as opx  
  2.   
  3. fileName = '../excelFile/TestBook.xlsx'  
  4. sheetName = 'Sheet1'  
Now, we're going to load the file or the workbook using load_workbook() method of openpyxl.
  1. workBook = opx.load_workbook(fileName)  
We all know the structure of a spreadsheet. Data is always stored in sheets. So, we need to get the sheet where our data is. Now, MS Excel has genereously given us the name Sheet1 so we don't have to cry here and there to get the name. So, let's launch the shuttles and load the sheet!
  1. sheet = workBook.get_sheet_by_name(sheetName)  
Now, we get the row and column count from the workbook. (Let's see if python can load all those records or not! Brute force test ;) )
  1. maxRows = sheet.max_row  
  2. print(maxRows)  
  3. maxCol = sheet.max_column  
  4. print(maxCol)  
Output
 
10001
3
 
Aw yes!
 
So, it does load all those things. Fascinating! Now, to the real part of the thing, where we read people's private data (This is where you feel like Google! Although this data is fake.) and write them to files.
  1. for i in range(1, maxRows + 1):  
  2. name = sheet.cell(row=i, column=1).value  
  3. outputFile = open('../dump/{}.txt'.format(name), 'w')  
  4.   
  5. for j in range(1, maxCol + 1):  
  6. outputFile.write(sheet.cell(row=i, column=j).value + '\n')  
  7. outputFile.close()  
  8.   
  9. # just checking for a count of the write op actually!  
  10. print('Written file number : {}'.format(i))  
Let's just add a nice message at the end so people can be assured that we're done!
  1. print('Done writing your business card hooman!')  
Are there actually 10,000 text files written? OMG!
 
Why don't we just find it out with another Python script?
  1. import os  
  2.   
  3. path = '../dump/'  
  4. fileList = os.listdir(path)  
  5.   
  6. fileNo = len(fileList)  
  7. print(fileNo)  
Output
 
9382
 
Come on! I can't see 10,000 files on my computer! [ Neither can I ]
 
Our very friendly Faker generator did some witty stuff and generated some duplicate entries. And that led to the discovery that we're actually looking at 9382 files. Well, duplicates will get overwritten easily since we're naming with each person's name after all!
 
Attaching a screenshot for ye!
 
python
 
So, where can I find such large Excel files?
 
Well, don't ask me. Better keep your scripts ready when you face those nasty, huge, ugly spreadsheets! Till then, I go incognito for chilling out, leaving you to scratch your heads over what just happened. Sayonara!
 
[ This post was originally published at my blog ]


Similar Articles