Machine learning à la carte: DALL-E 2 creates and retouches images on command

Fifteen months after the release of DALL-E, OpenAI has now presented the successor to the AI system. DALL-E 2 creates images based on descriptions and, in contrast to its predecessor, can also change existing images. The project name is a portmanteau of Spanish artist Salvador Dali's last name and the title of the Pixar film "WALL-E".
OpenAI has trained the system with numerous images and associated descriptions, similar to how the language model GPT-3 (Generative Pre-Trained Transformer), also developed by OpenAI, uses texts as a training basis for creating new texts. However, DALL-E 2 continues to focus on the combination of image and description, so it is not a multimodal model like the forays by Alep Alpha or even by Meta, the company formerly known as Facebook.

Background acronyms: GPT-3, CLIP and GLIDE

The first version of DALL-E is essentially based on GPT-3 and uses 12 billion parameters. In addition, it uses the tool CLIP (Contrastive Language-Image Pre-training) released parallel to DALL-E – an artificial neural network that converts visual concepts into categories.
The image was created by DALL-E 2 from the description posted on Twitter, "A rabbit detective sitting on a park bench reading a newspaper – in a Victorian setting"

At the release of the new system, Sam Altman called on Twitter to post whatever descriptions he gave to DALL-E 2 for quite impressive results.

Custom retouched

When inserting content into existing images, DALL-E 2 orients itself stylistically on the template.

In addition, the system can redesign existing images. The project page shows variations of well-known works of art such as "The Girl with a Pearl Earring" by Jan Vermeer van Delft or "The Kiss" by Gustav Klimt.
On June 2nd and 3rd, the Minds Mastering Machines will take place again as an on-site conference in Karlsruhe after two online editions. With its technical focus, the conference is aimed at professionals who implement ML projects in technical reality, including data scientists, software developers, software architects, project managers and team leaders.
The organizers heise Developer, iX and dpunkt.verlag selected a particularly large number of field reports this year. The program offers 33 lectures in three tracks over two days. The opening keynote by Jonas Andrulis, the founder of Alep Alpha, deals with GPT-3 and DALL-E, among other things.

Limits due to gaps in knowledge

Like GPT-3 and its predecessor, the system is only as good as the template, i.e. the images used for training with their descriptions. If the training basis contains images with incorrect descriptions, DALL-E 2 adopts the incorrect information and, for example, mistakes an airplane for a car.

Limits by precautionary measures

Generative models carry some risks: They can adopt common prejudices, and users can try to manipulate them or use them for content that glorifies violence, for example. For this reason, Open AI published InstructGPT in February, an adapted variant of GPT-3 that uses human feedback to exclude certain topics such as sexual content or violence.

Not public yet – waiting list

Dall-E 2 is currently running as a research project which is not yet available via the public API. According to OpenAI, the system is in a closed test phase in order to test the system with regard to the precautionary measures. Interested parties can already sign up for a waiting list.


Teddy bears from DALL-E 2

"Teddy bears working on new AI research on the moon in the 1980s"

Further information and demonstrations of DALL-E 2 can be found on the project website. A paper by OpenAI contains a technical description of the system.

Related Posts

Leave a Reply

%d bloggers like this: