Google’s Project Astra shows the future of multimodal search… and the present of OpenAI


The star of Google I/O 2024, Project Astra gives us a glimpse of a future in which we can converse endlessly with an intelligent assistant capable of reacting based on our environment.

Project Astra recognizes Schrodinger’s cat thought experiment // Source: Google

If you follow Google on the networks, you may have seen a video appear shortly before the OpenAI conference showing a multimodal voice assistant capable of responding in real time to questions integrating a visual element of the environment. This demonstration now has a name: Project Astra.

The future is multimodal

Asking a simple question to a search engine, a voice assistant or even an AI ChatBot is a thing of the past. The future now turns to multimodal queries linking a question, asked in writing or vocally, and another element, whether audio or visual. The goal is to make the search much more natural and the questions closer to what one might ask a human. Imagine asking “ What is that ? » to an assistant by pointing at an object with your smartphone.

Multimodal is already a component of Gemini, but Google wants to go even further and reinvent what science fiction films have already imagined, like Jarvis in Iron Man or Samantha in Her. This mission is Project Astra, developed by Google DeepMind. It is a voice assistant capable of responding continuously and in real time and no longer just to a specific request.

Whether through a smartphone camera or a prototype of connected glasses, Project Astra is able to answer questions as diverse as “ What neighborhood am I in? “, ” what name would you give to this duo? “, ” what does this piece of code do? ” or even ” where did I put my glasses? “. An impressive result.

Google lagging behind?

The feat lies not so much in Astra’s understanding of the world, but in its responsiveness. “ Bringing response time down to a conversational level is a difficult technical challenge », Specifies Google in its press release. This is certainly why it is still only a project, even if the Gemini application will inherit some of its capabilities during the year.

This impressive demo, however, is marred by yesterday’s announcement of ChatGPT Voice improvements. OpenAI’s conversational assistant remains to prove itself on many points, starting with the precision of its responses, but it remains a step ahead on one point: voice rendering. While Google has opted for a voice that is still a little robotic, similar or even identical to that of Google Assistant, ChatGPT Voice uses more human, more natural and less monotonous intonations and vocal markers. If some will feel an effect of “ disturbing valley “, there is no doubt that this is an important argument for large-scale adoption.

Additionally, the alpha version of this tool will be available to ChatGPT Plus subscribers in the coming weeks. The year promises to be ultra competitive in the field…






Source link -102