Google’s Gemini 1.5 Pro can now hear you


Mia Ogouchi

April 10, 2024 at 2:48 p.m.

0

Gemini 1.5 Pro AI gets new features © Igor Omilaev / Unsplash

Gemini 1.5 Pro AI gets new features © Igor Omilaev / Unsplash

Gemini, Google’s artificial intelligence, never ceases to amaze us. Its version 1.5 Pro can now work on audio files and extract all kinds of information from them.

While GPT-5 should be released this summer and can generate videos, the competition is also unveiling exciting new features. Announced last February, Gemini 1.5 Pro technology is capable of hearing. A functionality which is, for the moment, only available on the Vertex AI platform.

A highly anticipated feature

Google launched the Gemini multimodal model last December. The objective? Defeat OpenAI’s very popular ChatGPT. Available in three versions, Nano, Pro and Ultra, the technology had the particular ambition of investing in the audio and video field. This is done with the latest update from the Mountain View firm, which has just equipped its artificial intelligence with ears (virtual, of course).

Gemini AI can now listen to audio files © Franco Antonio Giovanella / Unsplash

Gemini AI can now listen to audio files © Franco Antonio Giovanella / Unsplash

Available on the Vertex AI development platform, version 1.5 Pro of Gemini can analyze an audio recording (call, meeting, etc.) and generate information without having to go through a transcription. To use it, simply download a file within the tool. The latter can then generate statistics, synthesize a speech and even provide an analysis to its users.

Many other AI announcements

With this new feature, the Pro version of Gemini becomes more efficient and faster than the Ultra model. Google also states that it is capable of “ understand complex instructions and eliminates the need to refine models “.

As good news never comes alone, the Mountain View firm made several other announcements in the process:

  • Imagen 2, the technology that transforms text into images and which is notably used by Gemini, is now capable of adding or removing elements within a visual at the user’s request. This functionality is already integrated into other models, such as Stable Cascade from Stability AI, or Generative AI from Getty Images.
  • SynthID, which adds an invisible digital watermark to images, has also been integrated into Imagen 2.

With the AI ​​market booming, tech giants are constantly improving their technologies. We can bet that the coming months should be full of interesting announcements.

The best AI to generate your content
To discover
The best AI to generate your content

March 27, 2024 at 8:50 p.m.

Service comparisons

Sources: Vertex AI, The Verge

Mia Ogouchi

Web editor by day, Hyrule prodigy by night, I love surfing (the web), organizing fights (Mac vs PC) and screaming (laughing at chubby cat memes). Digital news has no...

Read other articles

Web editor by day, Hyrule prodigy by night, I love surfing (the web), organizing fights (Mac vs. PC) and screaming (laughing at chubby cat memes). Digital news has no secrets for me. My favorite hobby? Like all the inhabitants of the village of Cocorico, of course: breaking pots… to find rubies!

Read other articles





Source link -99