In this article, I demonstrate how to deploy a Generative AI application using Docker and Flask. We will utilize Hugging Face's transformers
library to implement a text-generation model based on GPT-2. The application will expose an endpoint /generate
that accepts POST requests with a prompt in the JSON body. The model processes the prompt and generates text, which is then returned as a JSON response. This guide will walk you through setting up Flask, configuring the text-generation pipeline, and containerizing the application with Docker for seamless deployment.
Step1. Preparing the Gen AI Model
The first step is to prepare the Gen AI application code. Let’s assume we are deploying a text generation model using the Hugging Face Transformers library.
Basic Setup (app.py)
from flask import Flask, request, jsonify
from transformers import pipeline
app = Flask(__name__)
# Initialize text-generation pipeline
generator = pipeline("text-generation", model="gpt2")
# Define the /generate endpoint to accept POST requests
@app.route('/generate', methods=['POST'])
def generate_text():
# Ensure the request is a POST request
if request.method == 'POST':
# Parse the JSON request body
data = request.get_json()
# Check if 'prompt' exists in the request
if 'prompt' in data:
prompt = data['prompt']
results = generator(prompt, max_length=50, num_return_sequences=1)
return jsonify({"generated_text": results[0]['generated_text']})
else:
return jsonify({"error": "No prompt provided"}), 400
else:
return jsonify({"error": "Invalid method"}), 405
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8080)
This is a basic script to generate text using the GPT-2 model. You can adapt this depending on the Gen AI model you're working with.
Step 2. Create the Dockerfile
The next step is to write a Dockerfile
containerize for the Gen AI application. A Dockerfile
is a text document that contains all the commands to build the image.
# Use a lightweight Python image
FROM python:3.9-slim
# Set the working directory inside the container
WORKDIR /app
# Copy the requirements.txt file
COPY requirements.txt .
# Install the Python dependencies with no cache
RUN pip install --no-cache-dir -r requirements.txt
# Copy the application code
COPY app.py .
# Expose the web server port
EXPOSE 8080
# Command to run the Flask app
CMD ["python", "app.py"]
This Dockerfile
pulls a lightweight version of Python, installs the necessary dependencies, and runs the app.py
script when the container starts.
Step 3. Define the Dependencies
Create a requirements.txt
file that lists the Python libraries required by your Gen AI application.
transformers==4.18.0
torch==1.10.2
numpy==1.21.0
flask==2.0.2
werkzeug==2.0.3
You can add additional dependencies based on the libraries your model requires.
Step 4. Build the Docker Image
To build the Docker image, navigate to the directory containing your Dockerfile
and run the following command:
docker build -t gen-ai-app .
Here, gen-ai-app
is the name of the Docker image you are building. Docker will run through the Dockerfile
install the dependencies and prepare the containerized environment.
Step 5. Run the Docker Container
Once the image is built, you can run the container using the following command:
docker run -it gen-ai-app
Testing with curl
Test the Flask route using curl
. Ensure that you're making a POST
request with a JSON payload.
curl -X POST http://localhost:8080/generate -H "Content-Type: application/json" -d '{"prompt": "What is Docker?"}'