audio

Kokoro TTS: Advanced AI Text-to-Speech Model With 82M Parameters

Kokoro TTS is a cutting-edge, open-source AI text-to-speech model with 82M parameters, offering high-quality, natural voice synthesis. It supports multiple languages, ideal for audiobooks, podcasts, and training videos, and features real-time audio generation, customizable voicepacks, and automatic content segmentation. Users praise its efficiency, lifelike voices, and ease of use, making it suitable for diverse applications like enhancing accessibility and creating educational content.

https://kokorottsai.com/

Canopy Labs

Canopy Labs introduces Orpheus TTS, a cutting-edge speech generation model with human-like qualities, featuring various sizes (Nano to Medium) and high-quality output. Trained on extensive English speech data, it excels in zero-shot voice cloning and can express emotions. The model supports real-time streaming for conversational applications with low latency. Future releases include an open-source end-to-end speech model.

https://canopylabs.ai/model-releases

Crossing the Uncanny Valley of Conversational Voice

Sesame aims to enhance conversational voice technology by achieving “voice presence,” allowing digital assistants to engage in meaningful dialogue with emotional intelligence, natural timing, and contextual awareness. Current voice assistants are limited by emotional flatness, making interactions less engaging. They are developing a Conversational Speech Model (CSM) that utilizes transformers to create more natural speech by understanding context and adapting in real-time. Progress includes varying model sizes and an evaluation suite to assess contextual capabilities, but challenges remain in multilinguality and conversational dynamics. Future goals involve scaling models, enhancing datasets, and advancing language support, aiming for AI that better emulates human conversational nuances.

https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice

Scroll to Top