SpeechKit Recognition API v3, REST: AsyncRecognizer.RecognizeFile
Performs asynchronous speech recognition.
HTTP request
POST https://stt.api.cloud.yandex.net/stt/v3/recognizeFileAsync
Body parameters
{
// Includes only one of the fields `content`, `uri`
"content": "string",
"uri": "string",
// end of the list of possible fields
"recognitionModel": {
"model": "string",
"audioFormat": {
// Includes only one of the fields `rawAudio`, `containerAudio`
"rawAudio": {
"audioEncoding": "string",
"sampleRateHertz": "string",
"audioChannelCount": "string"
},
"containerAudio": {
"containerAudioType": "string"
}
// end of the list of possible fields
},
"textNormalization": {
"textNormalization": "string",
"profanityFilter": "boolean",
"literatureText": "boolean",
"phoneFormattingMode": "string"
},
"languageRestriction": {
"restrictionType": "string",
"languageCode": [
"string"
]
},
"audioProcessingType": "string"
},
"recognitionClassifier": {
"classifiers": [
{
"classifier": "string",
"triggers": [
"string"
]
}
]
},
"speechAnalysis": {
"enableSpeakerAnalysis": "boolean",
"enableConversationAnalysis": "boolean",
"descriptiveStatisticsQuantiles": [
"string"
]
},
"speakerLabeling": {
"speakerLabeling": "string"
},
"summarization": {
"modelUri": "string",
"properties": [
{
"instruction": "string",
// Includes only one of the fields `jsonObject`, `jsonSchema`
"jsonObject": "boolean",
"jsonSchema": {
"schema": "object"
}
// end of the list of possible fields
}
]
}
}
|
Field |
Description |
|
content |
string (bytes) Bytes with data Includes only one of the fields |
|
uri |
string S3 data URL Includes only one of the fields |
|
recognitionModel |
Configuration for speech recognition model. |
|
recognitionClassifier |
Configuration for classifiers over speech recognition. |
|
speechAnalysis |
Configuration for speech analysis over speech recognition. |
|
speakerLabeling |
Configuration for speaker labeling |
|
summarization |
Summarization options |
RecognitionModelOptions
|
Field |
Description |
|
model |
string Sets the recognition model for the cloud version of SpeechKit. |
|
audioFormat |
Specified input audio. |
|
textNormalization |
Text normalization options. |
|
languageRestriction |
Possible languages in audio. |
|
audioProcessingType |
enum (AudioProcessingType) For
|
AudioFormatOptions
Audio format options.
|
Field |
Description |
|
rawAudio |
RAW audio without container. Includes only one of the fields |
|
containerAudio |
Audio is wrapped in container. Includes only one of the fields |
RawAudio
RAW Audio format spec (no container to infer type). Used in AudioFormat options.
|
Field |
Description |
|
audioEncoding |
enum (AudioEncoding) Type of audio encoding.
|
|
sampleRateHertz |
string (int64) PCM sample rate. |
|
audioChannelCount |
string (int64) PCM channel count. Currently only single channel audio is supported in real-time recognition. |
ContainerAudio
Audio with fixed type in container. Used in AudioFormat options.
|
Field |
Description |
|
containerAudioType |
enum (ContainerAudioType) Type of audio container.
|
TextNormalizationOptions
Options for post-processing text results. The normalization levels depend on the settings and the language.
For detailed information, see documentation.
|
Field |
Description |
|
textNormalization |
enum (TextNormalization)
|
|
profanityFilter |
boolean Profanity filter (default: false). |
|
literatureText |
boolean Rewrite text in literature style (default: false). |
|
phoneFormattingMode |
enum (PhoneFormattingMode) Define phone formatting mode
|
LanguageRestrictionOptions
Type of restriction for the list of languages expected in the incoming audio.
|
Field |
Description |
|
restrictionType |
enum (LanguageRestrictionType) Language restriction type.
|
|
languageCode[] |
string The list of language codes to restrict recognition in the case of an auto model. |
RecognitionClassifierOptions
|
Field |
Description |
|
classifiers[] |
List of classifiers to use. For detailed information and usage example, see documentation. |
RecognitionClassifier
|
Field |
Description |
|
classifier |
string Classifier name |
|
triggers[] |
enum (TriggerType) Describes the types of responses to which the classification results will come. Classification responses will follow the responses of the specified types.
|
SpeechAnalysisOptions
|
Field |
Description |
|
enableSpeakerAnalysis |
boolean Analyse speech for every speaker |
|
enableConversationAnalysis |
boolean Analyse conversation of two speakers |
|
descriptiveStatisticsQuantiles[] |
string Quantile levels in range (0, 1) for descriptive statistics |
SpeakerLabelingOptions
|
Field |
Description |
|
speakerLabeling |
enum (SpeakerLabeling) Specifies the execution of speaker labeling.
|
SummarizationOptions
Represents transcription summarization options.
|
Field |
Description |
|
modelUri |
string The ID of the model to be used for completion generation. |
|
properties[] |
A list of suimmarizations to perform with transcription. |
SummarizationProperty
Represents summarization entry for transcription.
|
Field |
Description |
|
instruction |
string Summarization instruction for model. |
|
jsonObject |
boolean When set to true, the model will return a valid JSON object. Includes only one of the fields Specifies the format of the model's response. |
|
jsonSchema |
Enforces a specific JSON structure for the model's response based on a provided schema. Includes only one of the fields Specifies the format of the model's response. |
JsonSchema
Represents the expected structure of the model's response using a JSON Schema.
|
Field |
Description |
|
schema |
object The JSON Schema that the model's output must conform to. |
Response
HTTP Code: 200 - OK
{
"id": "string",
"description": "string",
"createdAt": "string",
"createdBy": "string",
"modifiedAt": "string",
"done": "boolean",
"metadata": "object",
// Includes only one of the fields `error`
"error": {
"code": "integer",
"message": "string",
"details": [
"object"
]
}
// end of the list of possible fields
}
An Operation resource. For more information, see Operation.
|
Field |
Description |
|
id |
string ID of the operation. |
|
description |
string Description of the operation. 0-256 characters long. |
|
createdAt |
string (date-time) Creation timestamp. String in RFC3339 text format. The range of possible values is from To work with values in this field, use the APIs described in the |
|
createdBy |
string ID of the user or service account who initiated the operation. |
|
modifiedAt |
string (date-time) The time when the Operation resource was last modified. String in RFC3339 text format. The range of possible values is from To work with values in this field, use the APIs described in the |
|
done |
boolean If the value is |
|
metadata |
object Service-specific metadata associated with the operation. |
|
error |
The error result of the operation in case of failure or cancellation. Includes only one of the fields The operation result. |
Status
The error result of the operation in case of failure or cancellation.
|
Field |
Description |
|
code |
integer (int32) Error code. An enum value of google.rpc.Code. |
|
message |
string An error message. |
|
details[] |
object A list of messages that carry the error details. |