Transform Weak Models into Powerhouses with Model Distillation

In this article, we'll explore how to take an underperforming AI model and supercharge it using the cutting-edge technique of model distillation from OpenAI. In other words, we will see how we can make GPT-3.5-Turbo work as GPT-4.

What Is Model Distillation?

Model distillation works on the Teacher-Student analogy. It is like having a really smart mentor (the "teacher" model) guide a student (the "student" model). The teacher model is powerful and has been trained on a lot of data, but it's also quite resource-intensive. In contrast, the student model is more streamlined and efficient, making it ideal for practical use. Here’s a more detailed breakdown:

  • Teacher Model: Think of this as an expert who knows a lot because they’ve studied extensively. This model is highly accurate but demanding in terms of computational power.
  • Student Model: This is the learner who needs to be trained. It's designed to be simpler and faster, perfect for situations where resources are limited.

Knowledge Transfer: During the training process, the student learns by observing and mimicking the teacher’s behavior. Instead of just focusing on the right answers, the student learns the patterns and techniques used by the teacher. This helps the student model become almost as effective as the teacher model but much more efficient.

Chat Completion

Why Model Distillation Rocks?

  • Cost Effective: Smaller models are less expensive as compared to large most recent models.
  • Efficiency: Smaller models require less computational power and memory, making them more suitable for deployment on edge devices.
  • Performance: The student model can perform almost as well as the teacher model with much better speed.

For student and teacher models, you can select the models from the list available on OpenAI. Once the model is selected, we need to create an instance of the OpenAI class, and that can be done as:

client = OpenAI(api_key=openai_key)

Here are the few lines of code to make a call to completion API using SDK:

response = client.chat.completions.create(
        model=model_name,
        messages=[
            {"role": "system", "content": "PROVIDE_SYSTEM_PROMPT_HERE"},
            {"role": "user", "content": "QUESTION_ASKED_BY_USER"}
        ],
        temperature=0.01,
        max_tokens=300,
        top_p=0.9
    )

For a sample question, we need to make a call to both the teacher and student model in order to evaluate the model performance. Here is my sample scenario, wherein I considered GPT-3.5 Turbo as my student model and GPT-4 as a teacher model.

Call using GPT-3.5 Turbo

GPT-3.5 Turbo

Call using GPT-4

GPT 4

The above responses clearly display the quality of output for the same input.

Now, to improve the student model's performance, we need to generate the synthetic data using the output of the teacher's model and upload that generated dataset to OpenAI, which will further participate in the distillation process.

Here is the code to save the dataset on OpenAI's server:

response = client.chat.completions.create(
        model=model_name,
        messages=[
            {"role": "system", "content": "PROVIDE_SYSTEM_PROMPT_HERE"},
            {"role": "user", "content": "QUESTION_ASKED_BY_USER"}
        ],
        temperature=0.01,
        max_tokens=300,
        top_p=0.9,
        metadata={
            "distillation": "teams_dataset"
        },
        store=store 
    )

You can validate the uploaded dataset by navigating to the Metadata tab:

Metadata

Now, to start the distillation process, you need to click on the Distill button placed on the right most of the Chat Completions tab, and it will open up the fine-tuning UI:

fine-tuned model

Furnish all the details, and it will go ahead and create the new model, which can be used for evaluation. 

Now, the final step is to make a call to the completion endpoint with this new model, and you will get a similar response as the teacher model. Here is my response to the distilled model:

Team

I hope you enjoyed the entire process of making your weak model smart.