Transframer AI dreams a 30-second video from an image

Deepmind: Transframer AI dreams a 30-second video from an image

Image: DALL-E 2 requested by MIXED

The article can only be displayed with JavaScript enabled. Please enable JavaScript in your browser and reload the page.

Deepmind’s new video AI, Transframer, can handle a wide range of image and video tasks, and create 30-second videos from a single frame.

Generative AI systems have moved from research labs to consumer and industrial applications in recent years, driven by OpenAI’s large-scale GPT-3 language model. Then, last April, the company introduced the DALL-E 2 imaging system, indirectly spawning alternatives like Midjourney and Stable Diffusion.

Google sister Deepmind is now showing Transframer, an AI model that could offer a glimpse of the next generation of generative AI models.

Deepmind Transframer: A model with many tasks

Deepmind’s Transframer is a visual prediction framework that can solve eight image processing and modeling tasks at once, such as depth estimation, instance segmentation, object recognition, or video prediction.

Transframer uses a set of context images with associated annotations, such as timestamps or camera viewpoints, and processes an image query based on these.

Transframer provides a framework for multiple imaging tasks. | Image: deep mind

The model processes compressed images using a network U whose outputs are passed to a DCTransfromer decoder. Specifically, images are compressed using DCT (Discrete Cosine Transform); DCT is also used in the JPEG compression method. The DCTransformer is specialized in DCT tokens.

Transframer generates new angles and complete videos

In addition to traditional imaging tasks such as depth estimation and object detection, Transframer is also capable of synthesizing new views of an object and predicting video trajectories.


In a short tweet, Deepmind shows off about six 30-second videos that Transframer dreamed up from a single input image. Despite the low resolution, some consistency can be seen.

Deepmind says that the results show that a framework like Transframer is suitable for challenging image and video modeling tasks. Transframer can also act as a multitasker to solve image and video analysis problems that previously used specialized models, the researchers said.

Sources: Deepmind (project page), Arxiv (paper)

Be the first to comment

Leave a Reply

Your email address will not be published.