AI Gen Sora: high quality video yes, but with what prompts?


OpenAI has just announced a generative AI service capable of producing 60 seconds of high-quality video using simple prompts. This is a text-to-video conversion model, called “Sora”, that OpenAI is starting to test.

Sora is a generative AI that can currently produce videos up to one minute long. The achievement lies in the quality of the video, and in respecting the user’s instructions.

The model uses training data from the Dall-E LLM to improve its natural language understanding, interpretation, and translation into video.

Generate complex scenes with multiple characters

By learning this data, Sora is able to generate complex scenes with multiple characters, specific types of behavior, and precise details about the foreground subject and background. Based on its understanding of the real world, details are naturally added to the video beyond what the user requests for a more realistic representation.

A number of examples of videos generated by Sora are publicly available, along with the prompts that enabled their creation. Here are some examples of video renderings with captions of the prompts used.

The prompt chosen for creating this video is: An elegant woman walks down a Tokyo street filled with bright neon lights and animated street signs. She wears a black leather jacket, a long red dress, black boots and a black handbag. She wears sunglasses and red lipstick. She walks with confidence and relaxation. The street is wet and reflective, which creates a mirror effect with the colored lights. Many pedestrians are walking.

Prompt: A white and orange tabby cat is seen darting happily through a dense garden, as if chasing something. His eyes are wide and happy as he trots forward, scanning the branches, flowers, and leaves as he walks. The path is narrow as it weaves its way between all the plants. The scene is captured from a ground level angle, following the cat closely, resulting in a low and intimate perspective. The image is cinematic, with warm tones and a grainy texture. Diffused daylight between the leaves and plants creates a warm contrast that accentuates the cat’s orange fur. The image is clear and sharp, with a shallow depth of field.

This prompt is surprisingly simple: A bird’s eye view of a construction site filled with workers, equipment and heavy machinery.

OpenAI clarifies that Sora is not yet a complete product. It may have difficulty simulating the physical representation of complex scenes.

Competition with Google

An example ? If you bite into a cookie, the AI ​​may not be able to depict the bite marks on the cookie. There is also a risk of confusing spatial details, such as misrepresenting left and right sides.

“We are taking significant safeguards to ensure Sora is safe to use before it is made available to the public,” OpenAI said, “and we are testing the model with a Red Team from experts in the fight against prejudice and hate speech.

The company is also developing a tool to identify videos generated by Sora in order to spot misleading content. When this generative model is implemented in an OpenAI product, the company plans to implement C2PA metadata that provides information about the history of videos.

“We are exploring how to evolve the model to be as useful as possible to creative professionals and, to this end, we are providing access to the tool to a number of visual artists, designers and filmmakers to get their opinion.”

In January, Google announced Lumiere, an AI model that generates highly realistic videos from text and images.


Source: “ZDNet Japan”



Source link -97