In this article, I’ll walk you through all the steps required to query your CSV data and get a response out of it using Azure OpenAI.
Let’s get started by importing the required packages.
Import Required Packages
Here are the packages which we need to import to get started:
import json
import pandas as pd
from dotenv import dotenv_values
Read Configuration
First of all, we need to set a few variables with information from the Azure portal and Azure OpenAI Studio:
openai_api_version = "API_VERSION"
openai_api_key = "API_KEY"
openai_api_base = "ENDPOINT"
If you are not sure how to grab the above values, I would recommend you watch my video here.
Next, we will go ahead and use the above variables to set environment variables:
config = dotenv_values(env_name)
openai_api_key = config['openai_api_key']
openai_api_base = config['openai_api_base']
openai_api_version = config['openai_api_version']
In my case, I’ve pushed all the configuration values in env_name and then I’m reading those values. Feel free to change above lines of code as per your convenience.
Preparing the Data
Next, we need to read a CSV file, aka comma-separated values, and push the data into a Pandas data frame. This CSV file contains the data about movies, which I grabbed from Kaggle.
df = pd.read_csv(‘MovieData.csv’)
df.head()
Here is the gist of the data:
In order to query such data, first, we need to construct some relationship and that can be done by combining the required columns. Here is how you can do this:
df['combined'] = 'Movie: ' + df['name'] + ' ' + 'year: ' + df['year'].astype(str) + ' ' + 'duration: ' + df['duration'] + ' ' + 'certificate: ' + df['certificate']
df['combined'].head()
And this is how combined columns look like:
Making a Call to Azure OpenAI Endpoint
At this point, we have our data ready. So, let’s go ahead and create an Azure OpenAI client.
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=openai_api_key,
api_version= "2023-10-01-preview", #I'm using this version
azure_endpoint = openai_api_base
)
context = df.head().to_json(orient="records")
Once the client object is created successfully, we are good to go ahead and make an API call.
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant who answers only from the given Context."},
{"role": "user", "content": "Context: " + context + "\n\n Query: " + query}
]
)
Before executing the above lines, make sure to set the query variable with the appropriate question. Here is mine:
query = "How many movies were not certified?"
response.choices[0].message.content
If everything goes well, then you will definitely receive a response. Here is the response that I received:
Conclusion
I hope you find this walkthrough useful.
If you find anything that is not clear, I recommend you watch my video recording, which demonstrates this flow from end to end.
Happy learning!