Nvidia’s new AI model allows you to literally paint pictures with words

Nvidia’s new AI model allows you to literally paint pictures with words
Photo Credit: Reuters
22 Nov, 2021

Nvidia, today, unveiled the second version of its GauGAN artificial intelligence (AI) model from 2019. The GauGAN model allows people to have an AI paint a picture by simply writing down a few words about that picture. “Simply type a phrase like ‘sunset at a beach’ and AI generates the scene in real time. Add an additional adjective like ‘sunset at a rocky beach,’ or swap ‘sunset’ to ‘afternoon’ or ‘rainy day’ and the model, based on generative adversarial networks, instantly modifies the picture,” the company said in a blog post.

Generative Adversarial Networks, or GANs, are a type of AI algorithm where two neural networks are pitted against each other. They are commonly used for creating deepfakes on the Internet. One neural net identifies and creates an image, while the other is tasked with catching a fake. When the second one can’t catch an image as a fake, it becomes the output of the algorithm.

According to Nvidia, GauGAN2 combined segmentation mapping, inpainting and text-to-image generation within a single model to create “photorealistic art” from words. “With the press of a button, users can generate a segmentation map, a high-level outline that shows the location of objects in the scene. From there, they can switch to drawing, tweaking the scene with rough sketches using labels like sky, tree, rock and river, allowing the smart paintbrush to incorporate these doodles into stunning images,” the post said.

Segmentation mapping allows researchers to separate and understand the different objects, or parts, in an image. On the other hand, inpainting is an area of AI research which allows algorithms to reproduce brush strokes, oil paintings and other artistic styles. It has been used for restoring damaged paintings in the past.

Nvidia says that GauGAN2 was trained using 10 million “high-quality landscape images” using the Nvidia Selene supercomputer, which is amongst the most powerful supercomputers in the world. The neural network in GauGAN2 learns the “connection between words and visuals” and paints a picture based on the same.

The second generation AI model will be faster and easier to run, according to Nvidia, and the company is also making interactive demos available. The company also said that artists can use the model to generate “other worldly” images, like the fictional planet Tatooine and its two suns from the Star Wars movie franchise. “All that’s needed is the text ‘desert hills sun’ to create a starting point, after which users can quickly sketch in a second sun,” the company said.