Sesame aims to enhance conversational voice technology by achieving “voice presence,” allowing digital assistants to engage in meaningful dialogue with emotional intelligence, natural timing, and contextual awareness. Current voice assistants are limited by emotional flatness, making interactions less engaging. They are developing a Conversational Speech Model (CSM) that utilizes transformers to create more natural speech by understanding context and adapting in real-time. Progress includes varying model sizes and an evaluation suite to assess contextual capabilities, but challenges remain in multilinguality and conversational dynamics. Future goals involve scaling models, enhancing datasets, and advancing language support, aiming for AI that better emulates human conversational nuances.
https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice