API v2 for streaming recognition

The streaming recognition service is located at: stt.api.cloud.yandex.net:443

Message with recognition settings

Parameter Description
config object
Field with the recognition settings and folder ID.
config
.specification
object
Recognition settings.
config
.specification
.languageCode
string
Recognition language.
See the model description for acceptable values. The default value is ru-RU, Russian.
config
.specification
.model
string
Language model to use for recognition.
The more accurate your choice of the model, the better the recognition result. You can only specify one model per request.
Acceptable values depend on the selected language. The default value is general.
config
.specification
.profanityFilter
boolean
Profanity filter.
Acceptable values:
  • true: Exclude profanities from the recognition results.
  • false (default): Do not exclude profanities from the recognition results.
config
.specification
.partialResults
boolean
Intermediate result filter.
Acceptable values:
  • true: Return intermediate results (part of recognized utterance). For intermediate results, final equals false.
  • false (default): Return only the final results (entire recognized utterance).
config
.specification
.singleUtterance
boolean
Flag disabling recognition after the first utterance.
Acceptable values:
  • true: Recognize only the first utterance, stop recognition and wait for the user to disconnect.
  • false (default): Continue recognition until the end of the session.
config
.specification
.audioEncoding
string
Audio format.
Acceptable values:
config
.specification
.sampleRateHertz
integer (int64)
Audio sampling rate.
This parameter is required if format equals LINEAR16_PCM. Valid values:
  • 48000 (default): 48 kHz.
  • 16000: 16 kHz.
  • 8000: Sampling rate of 8 kHz.
config.
specification.
rawResults
boolean
Flag for how to write numbers: true for words, false (default) for figures.
folderId string

ID of the folder you have access to. It is required for authentication with a user account (see Authentication with the SpeechKit API). Do not use this field if you make a request on behalf of a service account.

The maximum string length is 50 characters.

Experimental additional recognition settings

For streaming recognition models, new recognition settings are supported. They are passed to a gRPC procedure via metadata.

Parameter Description
x-normalize-partials boolean
Flag allowing you to get intermediate recognition results (parts of recognized utterance) in a normalized format: numbers as digits, profanity filter enabled, etc.
Valid values:
  • true: Return a normalized result.
  • false (default): Return an non-normalized result.

Audio message

Parameter Description
audio_content Audio fragment represented as an array of bytes. The audio must match the format specified in the message with recognition settings.

Message with recognition results

If speech fragment recognition is successful, you will receive a message containing a list of recognition results (chunks[]). Each result contains the following fields:

  • alternatives[]: List of recognized text alternatives. Each alternative contains the following fields:

    • text: Recognized text.
    • confidence: This field is currently not supported. Do not use it.
  • final: Flag indicating that this recognition result is final and will not change anymore. If the value is false, it means the recognition result is intermediate and may change as subsequent speech fragments get recognized.

  • endOfUtterance: Flag indicating that this result contains the end of the utterance. If the value is true, the new utterance will start with the next result you get.

    Note

    If you set singleUtterance=true, only one utterance per session will be recognized. After the message where endOfUtterance is true, the server will not recognize the following utterances and will wait for you to terminate the session.

Error codes returned by the server

To see how gRPC statuses correspond to HTTP codes, see google.rpc.Code.

List of possible gRPC errors returned by the service:

Code Status Description
3 INVALID_ARGUMENT Incorrect request parameters specified. Detailed information is provided in the details field.
9 RESOURCE_EXHAUSTED Client exceeded a quota.
16 UNAUTHENTICATED The operation requires authentication. Check the IAM token and the folder ID that you provided.
13 INTERNAL Internal server error. This error means that the operation cannot be performed due to a server-side technical problem, e.g., due to insufficient computing resources.

Use cases