Google’s latest large language model generates videos



summary
Summary

Google has unveiled VideoPoet, a new generative AI system that can create and edit videos from text and other input.

According to Google, VideoPoet is a large language model designed for a variety of video generation tasks, including text-to-video, image-to-video, video stylization, video inpainting and outpainting, and video-to-audio. Unlike competing models, VideoPoet integrates many capabilities into a single model, rather than relying on separately trained components for each task.

Video: Google

VideoPoet uses multiple tokenizers (MAGVIT V2 for video and image and SoundStream for audio) to train an autoregressive language model across video, image, audio, and text modalities. Once the model generates tokens conditioned on some context, these can be converted back into a viewable representation with the tokenizer decoders.

Ad

Ad

Video: Google

VideoPoet can generate videos with variable length and a range of motions and styles, depending on the text content. It can also take an input image and animate it with a prompt, predict optical flow and depth information for video stylization, and generate audio. By default, the model generates videos in portrait orientation to tailor its output towards short-form content.

Video: Google

Camera movements can also be controlled in videos by using text prompts to describe camera movement.

Video: Google

Recommendation

VideoPoet project page.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top