This Google AI can create hyper-realistic images with just a short text description

Google has demonstrated on many occasions what its different machine learning algorithms, such as MUM or LaMDA, are capable of, and continues to reflect those advances with a new artificial intelligence model called 'Image'. This, according to Jeff Dean, head of the company's AI division, promises to "unleash joint creativity between humans and computers", and is capable of generating images based on a simple and brief text description.
'Image' is very similar to DALL-E 2, the artificial intelligence developed by Open AI (a company founded by Elon Musk) that also allows images to be generated based on a text description. However, there are several differences between the two models. Among them, the level of detail and the efficiency when creating that image.
Google, in particular, ensures that its AI offers results with a much more precise level of detail compared to other systems. To test this, he created a benchmark called DrawBench, which compares his AI model to similar ones, such as VQ-GAN+CLIP, Latent Diffusion Models, or even DALL-E 2, and displayed the results side-by-side. so that "human evaluators" can differentiate between them and choose the most realistic. The evaluators, according to the company, concluded that the images generated by 'Image' have a higher quality and a better "image-text alignment" compared to the rest of the models.
Google AI is faster and more efficient than others, it also understands more complex descriptions
Image, Google's AI that generates images from a short text description, is also "more computationally efficient, more memory efficient, and converges faster" thanks to a proprietary architecture called U-Net. The results, therefore, are hyper-realistic images generated more precisely than any other model and from any type of text description.
Google, on the other hand, claims that 'Image' can also create images with descriptions based on specific places or even convoluted text. For example, if the user types "A Procyon lotor (raccoon) proposing to a Phascolarctos cinereus (koala) at DisneyLand," the company's AI should create an image based on this description and understand the scientific names of both animals.
'Image' at the moment, is an internal project and is not available to the public, as it can lead to the creation of images that contain "stereotypes and harmful representations", the company highlights.
'Image' is based on text encoders trained on uncurated web-scale data, and thus inherits the social biases and limitations of large linguistic models. As such, there is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision not to release Imagen for public use without further safeguards.

Related Posts

Leave a Reply

%d bloggers like this: