Yandex SpeechKit technology overview

Yandex SpeechKit voice technologies are up to any task related to human speech. SpeechKit can recognize speech either in real time or from pre-recorded audio files while automatically detecting the speaker's language. It can also vocalize pattern phrases and long texts with SpeechKit standard voices.

SpeechKit runs using the API interfaces. Depending on the task, you can use the gRPC or REST interfaces. For more information about API implementations in Yandex Cloud, see Yandex Cloud API concepts.

The table provides the most common SpeechKit use cases so that you can choose the appropriate technologies and configure them to meet your needs.

Description Recommended technologies Features and settings
Voice robot
Full or partial automation of telephone communications with customers. For user input: Streaming recognition.
For a system response: Speech synthesis using standard voices and a Brand Voice specifically designed for you.
Speech analytics
Quality control of agent performance
Transcribing and further analysis of audio recordings of dialogs between customers and call center agents or robots. To recognize pre-recorded audio files: Asynchronous recognition of audio files.
Voice control in apps and smart devices
Voice assistant
The user requests an action or search using voice and the service responds with an action with a voice comment or an image. For user input: Streaming recognition.
For a system response: Speech synthesis using standard voices and a Brand Voice.
Service adaptation to people with visual impairments
Voice control, voice hints and comments for visually impaired users. For user input: Streaming recognition.
For a system response: Speech synthesis using standard voices and a Brand Voice.
Recognizing audio recordings made during a meeting
Transcribing the audio recordings after the meeting is completed. To recognize pre-recorded audio files: Asynchronous recognition of audio files.
Voicing books and videos
Voicing a book or video with no human speaker involved. Speech synthesis using standard voices and Brand Voices.
Recording the minutes of a meeting
Transcribing the meeting minutes in real time To recognize the participants' speech: Streaming recognition.
Video subtitles
Creating subtitles for recorded videos To recognize an audio track: Asynchronous recognition of audio files.
Broadcast subtitles
Transcribing broadcasts in real time. To recognize the broadcast speech: Streaming recognition.
Transcribing voice messages
Converting short voice messages to text in messengers To recognize audio files: Synchronous recognition. Recognition result settings.