OpenAI GPT-4o Audio Models

OpenAI has introduced new transcription and voice generating AI models. These models are a big improvement from earlier versions. They are part of OpenAI''s goal to create automated systems that can handle tasks for users independently.
Key Features
Text to Speech Model gpt-4o-mini-tts
The new text to speech model, gpt 4o mini tts, makes speech sound more natural and realistic. Developers can easily control how the model speaks, setting the tone and emotion. For example, it can sound like a mad scientist or use a calm, mindfulness teacher''s tone.
Speech to Text Models gpt-4o-transcribe and gpt-4o-mini-transcribe
The new speech to text models, gpt 4o transcribe and gpt 4o mini transcribe, are better than the older Whisper model. These models are trained on diverse, high quality audio datasets. They are great at picking up accented and varied speech, even in noisy places. Plus, they are less likely to make up words or passages, which makes transcriptions more accurate.
- Reduced Word Error Rate The new models have much lower word error rates. gpt 4o transcribe has an impressive 2.46% error rate in English.
- Noise Cancellation and Voice Activity Detection These models include noise cancellation and can tell when a speaker has finished talking. This helps improve transcription accuracy.
- Customizable Voices The gpt 4o mini tts model lets developers customize voices. They can adjust accents, pitch, tone, and emotional inflections using text prompts.
Benefits
These new models offer several benefits. They make voice interactions more natural and accurate. This is especially useful in customer service and AI assistants. The customizable voices and improved transcription accuracy make these models very versatile.
Use Cases
Several companies have already started using OpenAI''s new audio models. EliseAI used them to create more natural and emotionally rich interactions with tenants. This led to higher tenant satisfaction and better call resolution rates. Decagon saw a 30% improvement in transcription accuracy. This helped their AI agents perform better, even in noisy environments.
Cost Price
The new models are available right away through OpenAI''s API. Here are the pricing details.
- gpt 4o transcribe 6.00 dollars per 1M audio input tokens about 0.006 dollars per minute
- gpt 4o mini transcribe 3.00 dollars per 1M audio input tokens about 0.003 dollars per minute
- gpt 4o mini tts 0.60 dollars per 1M text input tokens, 12.00 dollars per 1M audio output tokens about 0.015 dollars per minute
Reviews Testimonials
Companies like EliseAI and Decagon have reported significant improvements in voice AI performance. EliseAI saw higher tenant satisfaction and better call resolution rates. Decagon noted a 30% improvement in transcription accuracy, making their AI agents more reliable in real world scenarios.
Comments
Please log in to post a comment.