OpenAI Launches Voice Intelligence Features for API
OpenAI introduces new voice intelligence features in its API to help developers create apps that can converse, transcribe, and translate with users.

OpenAI Launches Voice Intelligence Features for API">
OpenAI has unveiled a suite of new voice intelligence features for its API, designed to empower developers to build applications that can engage in conversation, transcribe, and translate interactions with users. The company's latest offerings aim to revolutionize the way businesses and individuals interact with technology. The new GPT-Realtime-2 voice model is a significant upgrade from its predecessor, GPT-Realtime-1.5.
This enhanced model boasts GPT-5-class reasoning capabilities, enabling it to handle more complex user requests. Additionally, OpenAI has introduced GPT-Realtime-Translate, a real-time translation feature that can comprehend over 70 input languages and relay messages in 13 output languages. This feature is designed to facilitate seamless conversations across language barriers.
OpenAI has also launched GPT-Realtime-Whisper, a transcription capability that provides live speech-to-text functionality. This feature allows users to capture interactions in real-time, enabling a range of applications, from note-taking to customer service. According to OpenAI, these new models collectively enable voice interfaces to move beyond simple call-and-response interactions and instead, listen, reason, translate, transcribe, and take action as conversations unfold.
The company believes that these updates will be particularly beneficial for businesses looking to expand their customer service capabilities. However, OpenAI also notes that its new features will have applications in various areas, including education, media, events, and creator platforms. While these tools offer immense potential, OpenAI acknowledges the risk of misuse and has implemented guardrails to prevent abuse, such as spam, fraud, or online harassment.
To mitigate these risks, OpenAI has embedded certain triggers into the system that can halt conversations if they are detected as violating the company's harmful content guidelines. The new voice models are now included in OpenAI's Realtime API, with Translate and Whisper billed by the minute, while GPT-Realtime-2 is billed by token consumption.
Source: TechCrunch