Implementing Responsible AI using Python

AI is ever evolving faster than before. Generative AI has made AI more accessible. It has provided a number of use cases that are now possible to implement but were not possible earlier. With this new way of ease-of-use of Generative AI, we are posed with a number of challenges that involve risk if AI our use cases are not validated properly, including requests and responses. If requests received by our Gen AI applications are not validated properly, our system can respond in unexpected ways. Similarly, if responses generated by AI are not validated, they can increase the risk of embarrassing responses that may affect the reputation of the application.

To solve the above challenges, it is essential to implement Responsible AI practices to ensure our system works in the best possible manner. For this, communities have developed a number of frameworks and quality gates that ensure our AI applications are robust. There are several types of scenarios that come into the picture when we talk about responsible AI. Let us explore a few approaches below using different offerings by vendors and communities.

Now, let us set the Python environment in which to use the application.

Setup virtual environment

python -m venv responsibleaivenv

Activate it.

responsibleaivenv/Scripts/activate

Moderation APIs

Human Language is the input of AI, so it is important to ensure we are not allowed to misuse language. This moderation API ensures we are not using harmful language.

By Open AI - By the time this article is written, 'omni-moderation-latest' will be available. We will use the same to see how content is moderated. Below is a sample code to make an API call using HTTP requests.

Filename: open-sample.py

import http.client
import json

def moderate(key, input):
    conn = http.client.HTTPSConnection("api.openai.com")  # OpenAI endpoint
    payload = json.dumps({
        "model": "omni-moderation-latest",  # Model for moderation
        "input": input  # User input
    })
    headers = {  # Headers to set API key and content type
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {key}'
    }
    conn.request("POST", "/v1/moderations", payload, headers)  # HTTP request
    res = conn.getresponse()
    data = res.read()
    return json.loads(data.decode("utf-8"))  # Convert response to JSON

key = '<api-key>'  # Key, best practice is to store in a config file
result = moderate(key, 'This is a sample moderation request')  # Invoke moderation method

if not result['results'][0]['flagged']:
    print('This is allowed')  # Print result
else:
    print('This is not allowed')

Run the above code and the result will print 'This is allowed'.

python openai-sample.py

Output will be similar to the below from API, and we will use the flagged field to determine the content quality.

{
    "id": "modr-<uuid>",
    "model": "omni-moderation-latest",
    "results": [
        {
            "flagged": false,
            "categories": {
                "sexual": false,
                "sexual/minors": false,
                "harassment": false,
                "harassment/threatening": false,
                "hate": false,
                "hate/threatening": false,
                "illicit": false,
                "illicit/violent": false,
                "self-harm": false,
                "self-harm/intent": false,
                "self-harm/instructions": false,
                "violence": true,
                "violence/graphic": false
            },
            "category_scores": {
                "sexual": 0.0000000456,
                "sexual/minors": 0.0000000312,
                "harassment": 0.000091234,
                "harassment/threatening": 0.000073145,
                "hate": 0.0000000199,
                "hate/threatening": 0.0000000158,
                "illicit": 0.000042173,
                "illicit/violent": 0.0000000138,
                "self-harm": 0.000074693,
                "self-harm/intent": 0.000052194,
                "self-harm/instructions": 0.0000000047,
                "violence": 0.00892345,
                "violence/graphic": 0.00379218
            },
            "category_applied_input_types": {
                "sexual": [],
                "sexual/minors": [],
                "harassment": [],
                "harassment/threatening": [],
                "hate": [],
                "hate/threatening": [],
                "illicit": [],
                "illicit/violent": [],
                "self-harm": [],
                "self-harm/intent": [],
                "self-harm/instructions": [],
                "violence": [],
                "violence/graphic": []
            }
        }
    ]
}

Similar offerings are provided by several vendors like Microsoft, Google, IBM, Meta, etc.

Integrated Moderation in Generative AI models

Nowadays, moderation APIs are stitched with generative AI models as a single package. One example is the Gemini 2.0 model. It is essential to use these features to get the best results with compliant responses. We can use these models to perform our tasks and take advantage of responsible AI at the same time. The code written is straightforward. If something needs an explanation, mention it in the comments.

import http.client
import json

def extract_entities(key, input):
    conn = http.client.HTTPSConnection("generativelanguage.googleapis.com")
    
    # Task is mentioned below as request body
    payload = json.dumps({
        "contents": [
            {
                "parts": [
                    {
                        "text": f"Extract entities in form of comma-separated values from the input : '''{input}'''"
                    }
                ]
            }
        ]
    })
    headers = {
        'Content-Type': 'application/json'
    }
    
    # Model and API key added here
    conn.request("POST", f"/v1beta/models/gemini-2.0-flash-exp:generateContent?key={key}", payload, headers)
    res = conn.getresponse()
    data = res.read()
    return json.loads(data.decode("utf-8"))

key = '<Gemini API Key>'
result = extract_entities(key, 'I am Varun from India')

candidates = result.get("candidates", [])
for candidate in candidates:
    safety_ratings = candidate.get("safetyRatings", [])
    print("Safety Ratings:")
    for rating in safety_ratings:
        category = rating.get("category", "Unknown")
        probability = rating.get("probability", "Unknown")
        print(f"- {category}: {probability}")

Output

Safety Ratings

  1. HARM_CATEGORY_HATE_SPEECH: NEGLIGIBLE
  2. HARM_CATEGORY_DANGEROUS_CONTENT: NEGLIGIBLE
  3. HARM_CATEGORY_HARASSMENT: NEGLIGIBLE
  4. HARM_CATEGORY_SEXUALLY_EXPLICIT: NEGLIGIBLE

On executing the above code we will get the below output, You will notice safety ratings returned as part of the API response. We will use the safety ratings field to filter content based on our needs.

{
    "candidates": [
        {
            "content": {
                "parts": [
                    {
                        "text": "Varun, India\n"
                    }
                ],
                "role": "model"
            },
            "finishReason": "STOP",
            "safetyRatings": [
                {
                    "category": "HARM_CATEGORY_HATE_SPEECH",
                    "probability": "NEGLIGIBLE"
                },
                {
                    "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                    "probability": "NEGLIGIBLE"
                },
                {
                    "category": "HARM_CATEGORY_HARASSMENT",
                    "probability": "NEGLIGIBLE"
                },
                {
                    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                    "probability": "NEGLIGIBLE"
                }
            ],
            "avgLogprobs": -0.00022673401981592177
        }
    ],
    "usageMetadata": {
        "promptTokenCount": 20,
        "candidatesTokenCount": 5,
        "totalTokenCount": 25
    },
    "modelVersion": "gemini-2.0-flash-exp"
}

GuardrailsAI for Reliable AI Development Guardrails is a Python framework designed to enhance the reliability of AI applications by performing two essential functions. It implements Input/Output Guards within applications to identify, measure, and address various risks. A comprehensive overview of these risks is available on Guardrails Hub. Additionally, Guardrails enables the generation of structured data from large language models (LLMs), ensuring consistency and reliability in outputs. You can visit https://www.guardrailsai.com/docs/ to learn more about this framework. You can implement several aspects, such as Validation of input, output, topics filter, etc.

Thanks for reading till the end. This was an introductory article. Do mention in the comments if you want to learn more about it.


Similar Articles