How to Deploy a Gen AI Application on Docker?

Sarthak Varshney
May 19
1.7k
0
4

Article

In this article, I demonstrate how to deploy a Generative AI application using Docker and Flask. We will utilize Hugging Face's transformers library to implement a text-generation model based on GPT-2. The application will expose an endpoint /generate that accepts POST requests with a prompt in the JSON body. The model processes the prompt and generates text, which is then returned as a JSON response. This guide will walk you through setting up Flask, configuring the text-generation pipeline, and containerizing the application with Docker for seamless deployment.

Step1. Preparing the Gen AI Model

The first step is to prepare the Gen AI application code. Let’s assume we are deploying a text generation model using the Hugging Face Transformers library.

Basic Setup (app.py)

from flask import Flask, request, jsonify
from transformers import pipeline

app = Flask(__name__)

# Initialize text-generation pipeline
generator = pipeline("text-generation", model="gpt2")

# Define the /generate endpoint to accept POST requests
@app.route('/generate', methods=['POST'])
def generate_text():
    # Ensure the request is a POST request
    if request.method == 'POST':
        # Parse the JSON request body
        data = request.get_json()
        
        # Check if 'prompt' exists in the request
        if 'prompt' in data:
            prompt = data['prompt']
            results = generator(prompt, max_length=50, num_return_sequences=1)
            return jsonify({"generated_text": results[0]['generated_text']})
        else:
            return jsonify({"error": "No prompt provided"}), 400
    else:
        return jsonify({"error": "Invalid method"}), 405

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

This is a basic script to generate text using the GPT-2 model. You can adapt this depending on the Gen AI model you're working with.

Step 2. Create the Dockerfile

The next step is to write a Dockerfile containerize for the Gen AI application. A Dockerfile is a text document that contains all the commands to build the image.

# Use a lightweight Python image
FROM python:3.9-slim

# Set the working directory inside the container
WORKDIR /app

# Copy the requirements.txt file
COPY requirements.txt .

# Install the Python dependencies with no cache
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code
COPY app.py .

# Expose the web server port
EXPOSE 8080

# Command to run the Flask app
CMD ["python", "app.py"]

This Dockerfile pulls a lightweight version of Python, installs the necessary dependencies, and runs the app.py script when the container starts.

Step 3. Define the Dependencies

Create a requirements.txt file that lists the Python libraries required by your Gen AI application.

transformers==4.18.0
torch==1.10.2
numpy==1.21.0
flask==2.0.2
werkzeug==2.0.3

You can add additional dependencies based on the libraries your model requires.

Step 4. Build the Docker Image

To build the Docker image, navigate to the directory containing your Dockerfile and run the following command:

docker build -t gen-ai-app .

Here, gen-ai-app is the name of the Docker image you are building. Docker will run through the Dockerfile install the dependencies and prepare the containerized environment.

Step 5. Run the Docker Container

Once the image is built, you can run the container using the following command:

docker run -it gen-ai-app

Testing with `curl`

Test the Flask route using curl. Ensure that you're making a POST request with a JSON payload.

curl -X POST http://localhost:8080/generate -H "Content-Type: application/json" -d '{"prompt": "What is Docker?"}'