
AI Speech
A platform module combining speech recognition and synthesis technologies, tools for creating voice agents and post-processing recognition results based on SpeechKit.
Everything you need to work with your voice

Realtime API: Voice agents
Create voice agents in less than a second using a ready-made pipeline: speech recognition + model + File Search call + speech synthesis. Support for MCP integrations, file and web search (AI Search), and agent short-term memory.

SpeechKit speech recognition
Recognize speech in fractions of a second a split second, in all its variety and style, in real time and from pre-recorded audio files with automatic detection of the speaker’s language.

SpeechKit speech synthesis
Voice interfaces, messages, and scripts — from mass communications to interactive assistants. Can be used in IVR, customer information, voice assistants, and media content.

LLM processing of recognition results
SpeechKit recognizes audio and processes the result using a language model: it makes summaries, extracts facts, translates, and prepares structured data (for example, for CRM). Supported formats: text, arbitrary JSON, strict JSON schema.

Brand Voice for company branding
Choose the Lite version to quickly create a voice (20-40 min. speech) on your own, without code changes or complex processes. The Premium offers a custom voice for marketing and PR tasks with a variety of characteristics and multiple roles.

SpeechKit Hybrid
A solution for customers who need full, on-premise control the speech processing and synthesis processes. It is based on the same speech recognition and synthesis models as in the cloud, as well as the Speech Realtime model (part of AI Studio).
Use cases
Use technologies to solve business tasks, from support and sales to automating internal processes and content creation.
Test out the technology now
Yandex AI Studio offers Playground for your experiments. Try synthesizing, recognizing, and processng speech, or create a unique voice in the user-friendly interface.

Pricing policy
Costs depend on the scenario: speech recognition and synthesis, the version of the API used, and the mode of operation. More details about calculating costs can be found in the documentation.

On-premises without compromise
Use it in the cloud or deploy the entire voice technology stack on your own infrastructure.
Suitable for scenarios where a company values full control over data, an isolated contour, and integration with the company’s closed systems.

Trust by default
Speech technologies concern users, data, and brand reputation.
We take this into account at the platform level so you can focus on your product instead of on risks.
Other useful services

A service for receiving Yandex search database responses in XML format or HTML. It helps organize search on a site, a group of sites, or the internet, and track the position of sites for search queries.

Computer vision service for recognizing text in images and PDF files. It supports 45+ languages and detects them automatically.

A service for integrating Yandex Translator algorithms into applications or web projects for end users. It supports 100+ languages and can translate individual words and entire texts.
Start working with AI Speech
Try creating your first voice agent or a unique voice for your brand. Everything you need to get started is already in the console.
