SpeechKit Recognition API v3, REST: AsyncRecognizer.RecognizeFile

Performs asynchronous speech recognition.

HTTP request

POST https://stt.api.cloud.yandex.net/stt/v3/recognizeFileAsync

Body parameters

{
          // Includes only one of the fields `content`, `uri`
          "content": "string",
          "uri": "string",
          // end of the list of possible fields
          "recognitionModel": {
            "model": "string",
            "audioFormat": {
              // Includes only one of the fields `rawAudio`, `containerAudio`
              "rawAudio": {
                "audioEncoding": "string",
                "sampleRateHertz": "string",
                "audioChannelCount": "string"
              },
              "containerAudio": {
                "containerAudioType": "string"
              }
              // end of the list of possible fields
            },
            "textNormalization": {
              "textNormalization": "string",
              "profanityFilter": "boolean",
              "literatureText": "boolean",
              "phoneFormattingMode": "string"
            },
            "languageRestriction": {
              "restrictionType": "string",
              "languageCode": [
                "string"
              ]
            },
            "audioProcessingType": "string"
          },
          "recognitionClassifier": {
            "classifiers": [
              {
                "classifier": "string",
                "triggers": [
                  "string"
                ]
              }
            ]
          },
          "speechAnalysis": {
            "enableSpeakerAnalysis": "boolean",
            "enableConversationAnalysis": "boolean",
            "descriptiveStatisticsQuantiles": [
              "string"
            ]
          },
          "speakerLabeling": {
            "speakerLabeling": "string"
          },
          "summarization": {
            "modelUri": "string",
            "properties": [
              {
                "instruction": "string",
                // Includes only one of the fields `jsonObject`, `jsonSchema`
                "jsonObject": "boolean",
                "jsonSchema": {
                  "schema": "object"
                }
                // end of the list of possible fields
              }
            ]
          }
        }

Field	Description
content	string (bytes) Bytes with data Includes only one of the fields `content`, `uri`.
uri	string S3 data URL Includes only one of the fields `content`, `uri`.
recognitionModel	RecognitionModelOptions Configuration for speech recognition model.
recognitionClassifier	RecognitionClassifierOptions Configuration for classifiers over speech recognition.
speechAnalysis	SpeechAnalysisOptions Configuration for speech analysis over speech recognition.
speakerLabeling	SpeakerLabelingOptions Configuration for speaker labeling
summarization	SummarizationOptions Summarization options

RecognitionModelOptions

Field	Description
model	string Sets the recognition model for the cloud version of SpeechKit. For `Recognizer.RecognizeStreaming`, possible values are `general`, `general:rc`, `general:deprecated`. For `AsyncRecognizer.RecognizeFile`, possible values are `general`, `general:rc`, `general:deprecated`, `deferred-general`, `deferred-general:rc`, and `deferred-general:deprecated`. The model is ignored for SpeechKit Hybrid.
audioFormat	AudioFormatOptions Specified input audio.
textNormalization	TextNormalizationOptions Text normalization options.
languageRestriction	LanguageRestrictionOptions Possible languages in audio.
audioProcessingType	enum (AudioProcessingType) For `Recognizer.RecognizeStreaming`, defines the audio data processing mode. Default is `REAL_TIME`. For `AsyncRecognizer.RecognizeFile`, this field is ignored. `AUDIO_PROCESSING_TYPE_UNSPECIFIED` `REAL_TIME`: Process audio in mode optimized for real-time recognition, i.e. send partials and final responses as soon as possible. `FULL_DATA`: Process audio after all data was received.

AudioFormatOptions

Audio format options.

Field

Description

rawAudio

RawAudio

RAW audio without container.

Includes only one of the fields rawAudio, containerAudio.

containerAudio

ContainerAudio

Audio is wrapped in container.

Includes only one of the fields rawAudio, containerAudio.

RawAudio

RAW Audio format spec (no container to infer type). Used in AudioFormat options.

Field	Description
audioEncoding	enum (AudioEncoding) Type of audio encoding. `AUDIO_ENCODING_UNSPECIFIED` `LINEAR16_PCM`: Audio bit depth 16-bit signed little-endian (Linear PCM).
sampleRateHertz	string (int64) PCM sample rate.
audioChannelCount	string (int64) PCM channel count. Currently only single channel audio is supported in real-time recognition.

ContainerAudio

Audio with fixed type in container. Used in AudioFormat options.

Field

Description

containerAudioType

enum (ContainerAudioType)

Type of audio container.

CONTAINER_AUDIO_TYPE_UNSPECIFIED
WAV: Audio bit depth 16-bit signed little-endian (Linear PCM).
OGG_OPUS: Data is encoded using the OPUS audio codec and compressed using the OGG container format.
MP3: Data is encoded using MPEG-1/2 Layer III and compressed using the MP3 container format.

TextNormalizationOptions

Options for post-processing text results. The normalization levels depend on the settings and the language.
For detailed information, see documentation.

Field	Description
textNormalization	enum (TextNormalization) `TEXT_NORMALIZATION_UNSPECIFIED` `TEXT_NORMALIZATION_ENABLED`: Enable converting numbers, dates and time from text to numeric format. `TEXT_NORMALIZATION_DISABLED`: Disable all normalization. Default value.
profanityFilter	boolean Profanity filter (default: false).
literatureText	boolean Rewrite text in literature style (default: false).
phoneFormattingMode	enum (PhoneFormattingMode) Define phone formatting mode `PHONE_FORMATTING_MODE_UNSPECIFIED` `PHONE_FORMATTING_MODE_DISABLED`: Disable phone formatting

LanguageRestrictionOptions

Type of restriction for the list of languages expected in the incoming audio.

Field

Description

restrictionType

enum (LanguageRestrictionType)

Language restriction type.
All of these restrictions are used by the model as guidelines, not as strict rules.
The language is recognized for each sentence. If a sentence has phrases in different languages, all of them will be transcribed in the most probable language.

LANGUAGE_RESTRICTION_TYPE_UNSPECIFIED
WHITELIST: The list of most possible languages in the incoming audio.
BLACKLIST: The list of languages that are likely not to be included in the incoming audio.

languageCode[]

string

The list of language codes to restrict recognition in the case of an auto model.

RecognitionClassifierOptions

Field

Description

classifiers[]

RecognitionClassifier

List of classifiers to use. For detailed information and usage example, see documentation.

RecognitionClassifier

Field

Description

classifier

string

Classifier name

triggers[]

enum (TriggerType)

Describes the types of responses to which the classification results will come. Classification responses will follow the responses of the specified types.

TRIGGER_TYPE_UNSPECIFIED
ON_UTTERANCE: Apply classifier to utterance responses.
ON_FINAL: Apply classifier to final responses.
ON_PARTIAL: Apply classifier to partial responses.

SpeechAnalysisOptions

Field	Description
enableSpeakerAnalysis	boolean Analyse speech for every speaker
enableConversationAnalysis	boolean Analyse conversation of two speakers
descriptiveStatisticsQuantiles[]	string Quantile levels in range (0, 1) for descriptive statistics

SpeakerLabelingOptions

Field

Description

speakerLabeling

enum (SpeakerLabeling)

Specifies the execution of speaker labeling.

SPEAKER_LABELING_UNSPECIFIED
SPEAKER_LABELING_ENABLED: Enable speaker labeling.
SPEAKER_LABELING_DISABLED: Disable speaker labeling. Default value.

SummarizationOptions

Represents transcription summarization options.

Field

Description

modelUri

string

The ID of the model to be used for completion generation.

properties[]

SummarizationProperty

A list of suimmarizations to perform with transcription.

SummarizationProperty

Represents summarization entry for transcription.

Field	Description
instruction	string Summarization instruction for model.
jsonObject	boolean When set to true, the model will return a valid JSON object. Be sure to ask the model explicitly for JSON. Otherwise, it may produce excessive whitespace and run indefinitely until it reaches the token limit. Includes only one of the fields `jsonObject`, `jsonSchema`. Specifies the format of the model's response.
jsonSchema	JsonSchema Enforces a specific JSON structure for the model's response based on a provided schema. Includes only one of the fields `jsonObject`, `jsonSchema`. Specifies the format of the model's response.

JsonSchema

Represents the expected structure of the model's response using a JSON Schema.

Field

Description

schema

object

The JSON Schema that the model's output must conform to.

Response

HTTP Code: 200 - OK

{
          "id": "string",
          "description": "string",
          "createdAt": "string",
          "createdBy": "string",
          "modifiedAt": "string",
          "done": "boolean",
          "metadata": "object",
          // Includes only one of the fields `error`
          "error": {
            "code": "integer",
            "message": "string",
            "details": [
              "object"
            ]
          }
          // end of the list of possible fields
        }

An Operation resource. For more information, see Operation.

Field	Description
id	string ID of the operation.
description	string Description of the operation. 0-256 characters long.
createdAt	string (date-time) Creation timestamp. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
createdBy	string ID of the user or service account who initiated the operation.
modifiedAt	string (date-time) The time when the Operation resource was last modified. String in RFC3339 text format. The range of possible values is from `0001-01-01T00:00:00Z` to `9999-12-31T23:59:59.999999999Z`, i.e. from 0 to 9 digits for fractions of a second. To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).
done	boolean If the value is `false`, it means the operation is still in progress. If `true`, the operation is completed, and either `error` or `response` is available.
metadata	object Service-specific metadata associated with the operation. It typically contains the ID of the target resource that the operation is performed on. Any method that returns a long-running operation should document the metadata type, if any.
error	Status The error result of the operation in case of failure or cancellation. Includes only one of the fields `error`. The operation result. If `done == false` and there was no failure detected, neither `error` nor `response` is set. If `done == false` and there was a failure detected, `error` is set. If `done == true`, exactly one of `error` or `response` is set.

Status

The error result of the operation in case of failure or cancellation.

Field	Description
code	integer (int32) Error code. An enum value of google.rpc.Code.
message	string An error message.
details[]	object A list of messages that carry the error details.

Была ли статья полезна?

Overview

GetRecognition

SpeechKit Recognition API v3, REST: AsyncRecognizer.RecognizeFile

HTTP requestHTTP request

Body parametersBody parameters

RecognitionModelOptionsRecognitionModelOptions

AudioFormatOptionsAudioFormatOptions

RawAudioRawAudio

ContainerAudioContainerAudio

TextNormalizationOptionsTextNormalizationOptions

LanguageRestrictionOptionsLanguageRestrictionOptions

RecognitionClassifierOptionsRecognitionClassifierOptions

RecognitionClassifierRecognitionClassifier

SpeechAnalysisOptionsSpeechAnalysisOptions

SpeakerLabelingOptionsSpeakerLabelingOptions

SummarizationOptionsSummarizationOptions

SummarizationPropertySummarizationProperty

JsonSchemaJsonSchema

ResponseResponse

StatusStatus

Была ли статья полезна?

HTTP request

Body parameters

RecognitionModelOptions

AudioFormatOptions

RawAudio

ContainerAudio

TextNormalizationOptions

LanguageRestrictionOptions

RecognitionClassifierOptions

RecognitionClassifier

SpeechAnalysisOptions

SpeakerLabelingOptions

SummarizationOptions

SummarizationProperty

JsonSchema

Response

Status