Grok: Elon Musk’s generative AI can now understand and analyze images


While Google’s Gemini 1.5 Pro has just supported audio content, xAI’s Grok tackles image understanding and analysis.

Advertising, your content continues below

In a blog post published on April 12, 2024, xAI, Elon Musk’s company which developed generative AI, announces that version 1.5V of Grok “can now process a wide variety of visual information, including documents, charts, graphs, screenshots and photographs”.

Available soon for early testers and existing users, this new capability transforms Grok into a multimodal AI model, as it now supports various data types (here text and image).

On the performance side, xAI developers emphasize that Grok 1.5V “outperforms competitors in our new RealWorldQA benchmark, which assesses real-world spatial understanding”. To do this, the latter tests the different AI models on more than 700 images by asking them a question whose answer is “easily verifiable for each image”.

For example :

  • Which object is bigger: the pizza cutter or the scissors?
    • A. The pizza cutter is larger.
    • B. The scissors are bigger.
    • C. They are approximately the same size;
  • Given the view from our sedan’s front camera, do we have enough space to get around the gray car in front of us?

The comparison table also reveals results above the competition for the Mathivista tests, math, and TextVQA for text reading.

The blog post concludes by discussing the next advancements planned by xAI regarding the Grok AI model: improving multimodal understanding and generative capabilities. “In the coming months, we plan to make significant improvements to both capabilities, across various modalities such as images, audio and video”, concludes Elon Musk’s company.

Advertising, your content continues below



Source link -98