ChatGPT-4o: Advanced Voice Assistant and Improved Content Creation
OpenAI has launched GPT-4o, an enhanced AI model featuring a user-friendly voice assistant, inspired by HER (2013), capable of real-time interaction and improved image, video, and text processing.
OpenAI has introduced a new version of its flagship artificial intelligence (AI) system, featuring an advanced voice assistant designed to be more user-friendly. The new AI model, named GPT-4o, enhances its ability to process images and video, along with text, and allows real-time voice interaction. According to Mira Murati, OpenAI’s Chief Technology Officer, users can interrupt the voice assistant during conversations, and it can respond almost instantaneously.
In a live demonstration, OpenAI executives showcased the model’s ability to analyze code, translate languages, and guide users through algebra problems in real-time. The announcement of GPT-4o coincides with the start of Google’s annual developer conference, where Google is expected to unveil its own new AI products. The rivalry between OpenAI, Google, and Microsoft continues as they compete for dominance in the AI field. Microsoft was not involved in the development of GPT-4o.
Sam Altman, OpenAI’s CEO, compared the new product to AI tools seen in movies. He mentioned that the team drew inspiration from the 2013 film “Her,” about a man who falls in love with a voice assistant. Altman and other OpenAI employees referenced the movie on social media following the announcement.
Altman noted on his blog that the original ChatGPT hinted at the potential of language interfaces, but GPT-4o feels significantly more advanced. The new model can detect emotions in a person's voice or facial expressions and can switch between different emotional tones, from dramatic to robotic to singing. This feature will be available to ChatGPT-Plus subscribers, a $20-a-month service, in the coming weeks.
The introduction of GPT-4o continues AI’s significant impact on the entertainment industry by enhancing the ways in which content is created, consumed, and interacted with. GPT-4o's ability to switch between emotional tones and detect emotions in a user's voice or facial expressions can be used to tailor content that adapts to the audience’s reactions, making entertainment more personalized and engaging.
Moreover, GPT-4o's real-time translation and multilingual capabilities might affect the globalization and localization efforts as film & TV broaden the reach to a global audience. The model's efficiency and lower cost also make advanced AI tools more accessible to smaller studios and independent creators.
GPT-4o will also be offered to companies and is designed to be twice as fast and half the cost of the current GPT-4 Turbo model. The “o” in GPT-4o stands for “omni,” reflecting its comprehensive capabilities. Users of the free version of ChatGPT will gain access to GPT-4o’s image and vision features starting Monday.
Previously, OpenAI's “voice mode” feature, which combined three separate models to respond to users, struggled with multiple speakers and background noise. In contrast, GPT-4o is a single model trained on text, vision, and audio material, allowing it to respond more quickly and accurately.
OpenAI executives did not disclose the data used to train GPT-4o or whether it required less computational power. They are also developing a new AI model, GPT-5, which is expected to significantly advance the technology.
Murati emphasized that the team’s inspiration was not solely from the movie “Her” but also from human conversation. She described the new model as capable of natural, rich, and interactive communication, akin to a human’s ability to read tone and respond appropriately.