Introduction
In artificial intelligence, the ability to generate images from textual prompts has long been a fascinating and elusive pursuit. The emergence of advanced deep-learning models has brought us closer than ever to achieving this seemingly magical feat. One such model is Azure OpenAI DALL-E, a powerful image generation system that can transform words into stunning visual representations. In this blog post, we will delve into the world of DALL-E and explore its approach to generating images based on prompts, offering a glimpse into the future of AI-powered creativity.
Approach
Azure OpenAI DALL-E is built upon the foundation of deep learning and generative adversarial networks (GANs). It combines concepts from two distinct domains, namely VQ-VAE-2 and Transformer, to create a unique architecture that excels at image generation from textual descriptions.
Development
Let's go over the steps to create an OpenAI account in Azure.
Search Azure OpenAI and create an account.
Choose the region and type the account name. Then click on Next until the final step and click on Create.
Once the account is created, you'll see the overview of your Azure OpenAI account. Click on Explore.
Another tab is opened; this is the Azure AI Studio. Click on Try it Now under DALL-E playground. As you can see, when this article was written, DALL-E was under PREVIEW in Azure.
In the DALL-E playground, you can generate any image you want based on the prompts. Click on Settings.
You can specify the number of images to be generated (1 - 3) and the image size.
The prompt for this example was "an armchair in the shape of a strawberry". After you write the prompt, click on Generate.
You must wait a few seconds; the images will be displayed below the prompt.
You can regenerate, download and delete any image you want from the buttons below each image.
Also, above the prompt next to Settings, click on View code. You'll see two options: the Python code and API call to do the same.
# Note: The openai-python library support for Azure OpenAI is in preview.
import os
import openai
openai.api_type = "azure"
openai.api_base = "https://<openai_account>.openai.azure.com/"
openai.api_version = "2023-06-01-preview"
openai.api_key = os.getenv("OPENAI_API_KEY")
response = openai.Image.create(
prompt='USER_PROMPT_GOES_HERE',
size='1024x1024',
n=1
)
image_url = response["data"][0]["url"]
If you want the make a call to the OpenAI API, you can do it by making a POST request to https://<openai_account>.openai.azure.com/openai/images/generations:submit?api-version=2023-06-01-preview and with the following payload:
{
"prompt": "USER_PROMPT_GOES_HERE",
"n": 1,
"size": "1024x1024"
}
Conclusion
The marriage of deep learning, GANs, and transformer-based architectures in Azure OpenAI DALL-E has paved the way for remarkable advancements in image generation from textual prompts. Through its training on vast datasets, utilization of the VQ-VAE-2 framework, and incorporation of Transformer architecture, DALL-E has unlocked new dimensions of AI-powered creativity.
Thanks for reading
Thank you very much for reading; I hope you found this article interesting and may be useful in the future. If you have any questions or ideas that you need to discuss, it will be a pleasure to be able to collaborate and exchange knowledge together.