ChatGPT now offers you voice chats, here’s how to do it


When OpenAI released GPT-4 last March, one of its main advantages was its multimodal capability, which would allow ChatGPT to accept image inputs. However, multimodal capacity was not ready for deployment – ​​until today.

On Monday, OpenAI announced that ChatGPT can now “see, hear and speak,” hinting at the chatbot’s new capabilities to receive both image and voice data and respond to voice conversations.

The picture entry feature can be useful for getting help with things that can be seen, such as solving a math problem on a leaf, identifying the name of a plant, or looking at the items in its keep -eat and request recipes based.

Take a photo and add the question

In all of these cases, the user simply takes a photo of what they are looking at and adds the question they want answered. OpenAI says the image understanding capability is powered by GPT-3.5 and GPT-4.

The voice input and output feature gives ChatGPT the same functionality as a voice assistant. To request a task from ChatGPT, users just need to use their voice and once the request is processed, it will respond to you verbally.

In the demo shared by OpenAI, a user verbally asks ChatGPT to tell a bedtime story about a hedgehog. ChatGPT responds by telling a story, as voice assistants such as Amazon’s Alexa do.

The race for AI assistants is on

The race for AI assistants is on, as last week Amazon announced that it would give Alexa a new LLM that would give it similar capabilities to ChatGPT, making it a hands-free AI assistant. ChatGPT’s voice integration into its platform achieves the same result.

To support the voice function, OpenAI uses Whisper, its speech recognition system, to transcribe the words spoken by a user into text, as well as a new text-to-speech model capable of generating human-like sound from a text, with only a few seconds of speech.

To create ChatGPT’s five voices that users can choose from, the company collaborated with professional voice actors.

Only for ChatGPT Plus and Enterprise

Voice and picture features will be available only for ChatGPT Plus and Enterprise over the next two weeks. However, OpenAI says it will expand access to this feature to other users, such as developers, soon after.

If you are a Plus or Enterprise user, to access the image capture feature, simply tap the photo button in the chat interface and upload an image. To access the voice function, go to Settings & New features and opt for voice conversations.

Bing Chat, which is supported by GPT-4, supports image and voice input and is completely free. If you want to test these features but don’t have access to them yet, Bing Chat is a good alternative.


Source: “ZDNet.com”



Source link -97