Getting started with SpeechKit

For information on pricing, see SpeechKit pricing policy.

Getting started

Management console

Navigate to the management console and log in to Yandex Cloud or sign up if you do not have an account yet. For information on how to get started with Yandex Cloud, see Getting started with Yandex Cloud.
Accept the user agreement.
In Yandex Cloud Billing, make sure you have a billing account linked and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one.

Speech recognition

AI Studio UI

API

In the AI Studio UI, select the folder for which your account has the ai.playground.user and ai.datasets.editor roles or higher.
In the left-hand panel, expand AI Speech and select Speech recognition.
Under Speech recognition on the Recognition parameters tab:
- Language: Select the language or leave Automatic.
- Text normalization: Presents dates and times in numerical format, converts numbers from text to digits, and provides access to additional settings.
- Profanity filter: Masks profanity.
- Literature text: Adds capital letters and punctuation marks.
- Speaker recognition: Attributes each recognized phrase to a particular speaker.
- Grouping speaker phrases: Divides phrases into two groups by speaker.
Click Select file or drag the audio file to the loading area.
Classifiers: Finds phrases of a given category in the text, e.g., greetings, negativity, or obscenity. This works only for Russian.
Result processing: Processing of results with the help of an LLM:
- Model: Select the model for processing. The processing cost depends on the model you select.
- Instructions:
  - Enter a prompt in the input field or select a ready-made one.
  - Result format: Specify your preferred recognition result format.
  - Add instructions: Add another instruction. You can add up to five instructions in total.
Click Start recognition to start speech recognition for the audio file.
Click View code to get the request code for Python REST or Python gRPC.

screen

For a detailed guide, see Speech recognition using Playground.

SpeechKit Playground features basic speech recognition options. For more flexible recognition settings, use the API.

Learn how to recognize short and long pre-recorded audio files in SpeechKit. The service also supports real-time voice recognition.

Speech synthesis

AI Studio UI

API

In the AI Studio UI, select the folder for which your account has the ai.playground.user and ai.datasets.editor roles or higher.
In the left-hand panel, expand AI Speech and select Speech synthesis.
On the Speech synthesis tab, paste up to 5,000 characters of text into the central part of the window.
In the settings section on the left side of the window:
- Pauses: Select the length of pauses between words or specify it yourself.
- Emphasize word: Emphasize the essential words.
- Stress: Mark the stressed vowels to clarify the correct pronunciation of words.
- Phonemes: Monitor the correct pronunciation of words using phonemes.
Under Synthesis settings on the right side of the window:
- Language: Select the speaker's language.
- Voice: Specify the speaker's voice.
- Role: Select the speaker's role.
- Speech speed: Set the speaker's speech rate.
- Voice pitch: Adjust the speaker's voice pitch.
- Audio format: Select the audio format.
To start synthesis, click Synthesize and playback.
To download the result, click .

screen

For a detailed guide, see Speech synthesis using Playground.

SpeechKit Playground features basic speech synthesis options. For more flexible synthesis settings, use the API.

Learn how to convert text to audio using the SpeechKit API v1 and API v3. In the API v3, you can set up speech synthesis more flexibility. For more information about the differences between the API versions, see Synthesis features.

Was the article helpful?

SpeechKit technology overview

How to recognize short audio files in the API v1

Getting started with SpeechKit

Getting startedGetting started

Speech recognitionSpeech recognition

Speech synthesisSpeech synthesis

See alsoSee also

Was the article helpful?

Getting started

Speech recognition

Speech synthesis

See also