In recent weeks, the DALL-E 2 AI image generator has been making waves on Twitter. Google released its own version tonight called “Imagen” and it combines a deep level of language understanding with an “unprecedented degree of photorealism”.
According to Jeff Dean, head of AI at Google, said AI systems like these “can unleash joint human/computing creativity”, and Imagen is “one direction [the company is] to pursue. The advance made by Google Research, Brain Team on its text-to-image delivery model is the level of realism. In general, DALL-E 2 is mostly realistic with its release, but a closer look might reveal the artistic licenses done. (To learn more, be sure to watch this explainer video.)
Imagen relies on the power of large-transformer language models to understand text and relies on the strength of diffusion models in generating high-fidelity images. Our key finding is that large generic language models (e.g., T5), pre-trained on text corpora, are surprisingly efficient at encoding text for image synthesis: increasing the size of the language model in Imagen improves both sample fidelity and image-text alignment. more than increasing the size of the image delivery model.
To prove this breakthrough, Google created a benchmark for evaluating text-to-image models called DrawBench. Human raters preferred “Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.” It was compared to VQ-GAN+CLIP, latent diffusion models and DALL-E 2.
Meanwhile, metrics used to prove Imagen is better at understanding user requests include spatial relationships, long text, rare words, and difficult prompts. Another breakthrough achieved is a new Efficient U-Net architecture that is “more computationally efficient, more memory efficient, and converges faster.”
Imagen achieves a new peak FID score of 7.27 on the COCO dataset, never training on COCO, and human raters find the Imagen samples to be on par with the COCO data itself in the image-text alignment.
On the societal impact front, Google “has decided not to release any public code or demo” of Imagen at this time given possible abuse. Besides:
Imagen relies on text encoders trained on web-scale uncured data, and thus inherits the social biases and limitations of large language models. As such, there is a risk that Imagen has encoded harmful stereotypes and portrayals, which guides our decision not to release Imagen for public use without further safeguards in place.
That said, there is an interactive demo on the site, and the research paper is available here.
Learn more about Google AI:
FTC: We use revenue-generating automatic affiliate links. After.
Check out 9to5Google on YouTube for more info:
#Googles #Imagen #image #text #generator #offers #unprecedented #photorealism #Gallery