Google claims its text-to-image AI delivers “unprecedented photorealism.”


Kris Holt

has demonstrated an artificial intelligence system that can create images based on text input. The idea is that users can enter any descriptive text and the AI ​​will convert that into an image. The company says the , created by the Brain Team at Google Research, offers “an unprecedented level of photorealism and deep language understanding.”

This isn’t the first time we’ve seen such AI models. (and ) generated both headlines and images because it is so adept at converting text into images. However, Google’s version tries to create more realistic images.

To evaluate Imagen in comparison to other text-to-image models (including DALL-E 2, VQ-GAN+CLIP, and Latent Diffusion Models), the researchers created a benchmark called . That’s a list of 200 text prompts typed into each model. Human raters were asked to rate each image. They “prefer Imagen over other models in head-to-head comparisons, both in terms of pattern quality and image-text alignment,” Google said.

It’s worth noting that the examples shown on the are curated. Therefore, these may be the best of the best images the model has created. They may not accurately reflect most of the visuals he creates.

Like DALL-E, Imagen is not publicly available. Google considers it not yet suitable for use by the general population for a number of reasons. For one, text-to-image models are typically trained on large datasets scraped from the web rather than curated, which introduces a number of problems.

“While this approach has enabled rapid algorithmic advances in recent years, datasets of this type often reflect social stereotypes, oppressive viewpoints, and derogatory or otherwise harmful associations with marginalized identity groups,” the researchers write. “While some of our training data was filtered to remove noise and unwanted content such as pornographic images and toxic language, we also used the LAION-400M dataset, which is known to contain a wide range of inappropriate content, including pornographic images and racial slurs and harmful social stereotypes.”

As a result, they said, imagen inherited the “social biases and limitations of large language models” and can represent “harmful stereotypes and representations.” The team said preliminary findings suggested that the AI ​​encodes social biases, including a tendency to create images of people with lighter skin tones and assign them to specific stereotypical gender roles. Additionally, the researchers note that there is potential for abuse if Imagen were released to the public as is.

However, the team may allow the public to type text into a version of the model to create their own images. “In future work, we will explore a framework for responsible externalization that balances the value of external audit with the risks of unrestricted open access,” the researchers wrote.

However, you can try Imagen to a limited extent. On you can create a description with preselected phrases. Users can choose whether the image should be a photo or an oil painting, the type of animal shown, the clothes they wear, the action they take, and the setting. So, if you’ve always wanted to see an interpretation of an oil painting of a blurry panda wearing sunglasses and a black leather jacket skateboarding on the beach, here’s your chance.

Google research

All products recommended by Engadget are selected by our editorial team independently from our parent company. Some of our stories contain affiliate links. If you buy something through one of these links, we may receive an affiliate commission.

You May Also Like