Speech recognition (STT)

Incorrect stress and pronunciation

Create a support ticket and attach examples so that developers can fine-tune the speech synthesis model for future releases.

Poor speech recognition quality at 8kHz

If the issue is systematic (tens of percent of the total number of speech recognition requests), submit a support ticket and attach examples for analysis. The more examples you send, the more likely the developers will discover the bug.

Feedback form on speech recognition quality

If you have any persistent issues, contact support and provide files and description.

Two channels were recognized as one. How to recognize each channel separately

You can recognize multi-channel audio files only using asynchronous recognition.

Check the format of your recording:

For LPCM, set the config.specification.audioChannelCount parameter to 2.
You can skip this parameter for MP3 and OggOpus, since such a file contains information about the number of channels. The file will be automatically split into the set number of recordings.

The recognized text in the response is separated by the channelTag parameter.

Is it possible to recognize two or more voices separated by speaker?

You can recognize multi-channel audio files only using asynchronous recognition.

During speech recognition, text is not split by voice, but you can place the voices in different channels and separate the recognized text in the response with the channelTag parameter.

You can specify the number of channels in your request using the config.specification.audioChannelCount parameter.

The file does not exceed the limit, but an error occurs during recognition

If your file is multi-channel, take into account the total recording time of all channels. For the full list of limitations, see Quotas and limits in SpeechKit.

Internal Server Error

Make sure the format you specified in the request matches the actual file format. If the error persists, send us examples of your audio files you fail to recognize.

Where can I find an example of microphone speech recognition?

Example of streaming recognition of microphone-recorded speech.

Can I use POST for streaming recognition?

Stream recognition uses the gRPC remote procedure call mechanism and is not supported in the REST API, so you cannot use the POST method.

A streaming recognition session is broken/terminated

When using the API v2 for streaming recognition, the service awaits audio data. If it does not receive any data within 5 seconds, the session is terminated. You cannot change this parameter in the API v2.

Streaming recognition runs in real time. You can send "silence" for recognition so that the service does not terminate the connection.

We recommend using the API v3 for streaming recognition. The API v3 features a special message type to send "silence", so you will not have to simulate it in your audio.

How does the service figure out the end of an utterance and the duration of a recognition session?

The end of an utterance is detected automatically by the "silence" following the utterance. For more information, see Detecting the end of a phrase.

The maximum session duration for streaming recognition is 5 minutes.

What should I do if SpeechKit does not listen to a conversation to the end or, conversely, it takes too long to wait until it ends?

Interruptions or delays during streaming recognition may occur due to detecting the end of utterance (EOU). For recommendations on setting up EOU, see Detecting the end of a phrase.

Was the article helpful?

General questions

Speech synthesis

Speech recognition (STT)

Incorrect stress and pronunciation

Poor speech recognition quality at 8kHz

Feedback form on speech recognition quality

Two channels were recognized as one. How to recognize each channel separately

Is it possible to recognize two or more voices separated by speaker?

The file does not exceed the limit, but an error occurs during recognition

Internal Server Error

Where can I find an example of microphone speech recognition?

Can I use POST for streaming recognition?

A streaming recognition session is broken/terminated

How does the service figure out the end of an utterance and the duration of a recognition session?

What should I do if SpeechKit does not listen to a conversation to the end or, conversely, it takes too long to wait until it ends?

OutOfRange desc = Exceeded maximum allowed stream duration error

Use SA for s3 file recognition error

What goes into the usage cost?

Was the article helpful?

Speech recognition (STT)

Incorrect stress and pronunciationIncorrect stress and pronunciation

Poor speech recognition quality at 8kHzPoor speech recognition quality at 8kHz

Feedback form on speech recognition qualityFeedback form on speech recognition quality

Two channels were recognized as one. How to recognize each channel separatelyTwo channels were recognized as one. How to recognize each channel separately

Is it possible to recognize two or more voices separated by speaker?Is it possible to recognize two or more voices separated by speaker?

The file does not exceed the limit, but an error occurs during recognitionThe file does not exceed the limit, but an error occurs during recognition

Internal Server ErrorInternal Server Error

Where can I find an example of microphone speech recognition?Where can I find an example of microphone speech recognition?

Can I use POST for streaming recognition?Can I use POST for streaming recognition?

A streaming recognition session is broken/terminatedA streaming recognition session is broken/terminated

How does the service figure out the end of an utterance and the duration of a recognition session?How does the service figure out the end of an utterance and the duration of a recognition session?

What should I do if SpeechKit does not listen to a conversation to the end or, conversely, it takes too long to wait until it ends?What should I do if SpeechKit does not listen to a conversation to the end or, conversely, it takes too long to wait until it ends?

OutOfRange desc = Exceeded maximum allowed stream duration errorOutOfRange desc = Exceeded maximum allowed stream duration error

Use SA for s3 file recognition errorUse SA for s3 file recognition error

What goes into the usage cost?What goes into the usage cost?

Was the article helpful?

Incorrect stress and pronunciation

Poor speech recognition quality at 8kHz

Feedback form on speech recognition quality

Two channels were recognized as one. How to recognize each channel separately

Is it possible to recognize two or more voices separated by speaker?

The file does not exceed the limit, but an error occurs during recognition

Internal Server Error

Where can I find an example of microphone speech recognition?

Can I use POST for streaming recognition?

A streaming recognition session is broken/terminated

How does the service figure out the end of an utterance and the duration of a recognition session?

What should I do if SpeechKit does not listen to a conversation to the end or, conversely, it takes too long to wait until it ends?

OutOfRange desc = Exceeded maximum allowed stream duration error

Use SA for s3 file recognition error

What goes into the usage cost?