2024 - Google's Project Astra shows the future of multimodal search... and the present of OpenAI

The star of Google I/O 2024, Project Astra gives us a glimpse of a future in which we can converse endlessly with an intelligent assistant capable of reacting based on our environment.

Project Astra recognizes Schrodinger’s cat thought experiment // Source: Google

If you follow Google on the networks, you may have seen a video appear shortly before the OpenAI conference showing a multimodal voice assistant capable of responding in real time to questions integrating a visual element of the environment. This demonstration now has a name: Project Astra.

One more day until #GoogleIO! We’re feeling 🤩. See you tomorrow for the latest news about AI, Search and more. pic.twitter.com/QiS1G8GBf9
– Google Google) May 13, 2024

The future is multimodal

Asking a simple question to a search engine, a voice assistant or even an AI ChatBot is a thing of the past. The future now turns to multimodal queries linking a question, asked in writing or vocally, and another element, whether audio or visual. The goal is to make the search much more natural and the questions closer to what one might ask a human. Imagine asking “ What is that ? » to an assistant by pointing at an object with your smartphone.

Multimodal is already a component of Gemini, but Google wants to go even further and reinvent what science fiction films have already imagined, like Jarvis in Iron Man or Samantha in Her. This mission is Project Astra, developed by Google DeepMind. It is a voice assistant capable of responding continuously and in real time and no longer just to a specific request.

Whether through a smartphone camera or a prototype of connected glasses, Project Astra is able to answer questions as diverse as “ What neighborhood am I in? “, ” what name would you give to this duo? “, ” what does this piece of code do? ” or even ” where did I put my glasses? “. An impressive result.

Google lagging behind?

The feat lies not so much in Astra’s understanding of the world, but in its responsiveness. “ Bringing response time down to a conversational level is a difficult technical challenge », Specifies Google in its press release. This is certainly why it is still only a project, even if the Gemini application will inherit some of its capabilities during the year.

This impressive demo, however, is marred by yesterday’s announcement of ChatGPT Voice improvements. OpenAI’s conversational assistant remains to prove itself on many points, starting with the precision of its responses, but it remains a step ahead on one point: voice rendering. While Google has opted for a voice that is still a little robotic, similar or even identical to that of Google Assistant, ChatGPT Voice uses more human, more natural and less monotonous intonations and vocal markers. If some will feel an effect of “ disturbing valley “, there is no doubt that this is an important argument for large-scale adoption.

Additionally, the alpha version of this tool will be available to ChatGPT Plus subscribers in the coming weeks. The year promises to be ultra competitive in the field…

Source link -102

These Thomson TVs have the right amount of innovation to please the eye and the wallet

Dangerous malware is spreading through pirated versions of the suite, don’t install it!

“I am the best worker in France and this is the mistake that many people make with strawberries”

these tender never-before-seen family photos shared by his brother James

Booked trips cancelled: Travel group FTI Touristik is insolvent

Google’s Project Astra shows the future of multimodal search… and the present of OpenAI

The future is multimodal

Google lagging behind?