Analyzing recognition results

The SpeechKit API v3 can analyze audio during recognition and return additional information together with the recognition results. This information includes start/end timestamps of particular words or phrases, if any, durations of utterances and pauses, speech rate, word count in utterances, and other speech analysis labels and metrics.

Audio classifiers

Note

Audio classifiers are only supported for the Russian speech.

You can apply classifiers both to intermediate and final recognition results. To enable a classifier, set the recognition_classifier parameter in the session options. The results of classifier positives will arrive in a separate message right after the events specified in the classifier settings. For classifiers, these may be the events of the partial, eou_update, or final type.

SpeechKit supports the following classifiers:

Classifier	Description	Result
`formal_greeting`	Formal greeting, e.g., good afternoon or good morning	Probability of a phrase matching the formal greeting
`informal_greeting`	Informal greeting, e.g., hi or hey there	Probability of a phrase matching the informal greeting
`formal_farewell`	Formal farewell, e.g., goodbye or have a nice day	Probability of a phrase matching the formal farewell
`informal_farewell`	Informal farewell, e.g., bye-bye or adios	Probability of a phrase matching the informal farewell
`insult`	Insults, e.g., idiot or jerk	Probability of a phrase matching the insult
`profanity`	Profanity	Probability of a phrase matching the profanity
`gender`	Gender	Probability values for `male` and `female`
`negative`	Negativity	Probability of a recognized phrase being negative
`answerphone`	Robot's answer	Probability of a phrase belonging to a voice bot or answerphone

Python

session_options = stt_pb2.StreamingRequest(
          session_options=stt_pb2.StreamingOptions(
            recognition_model="general",
        
            # Classifier settings
            recognition_classifier=stt_pb2.RecognitionClassifierOptions(
              classifiers=[
                # Detecting insults in utterances
                stt_pb2.RecognitionClassifier(
                  classifier="insult",
                  triggers=[stt_pb2.RecognitionClassifier.ON_UTTERANCE]
                ),
                # Detecting profanity in utterances
                stt_pb2.RecognitionClassifier(
                  classifier="profanity",
                  triggers=[stt_pb2.RecognitionClassifier.ON_UTTERANCE]
                ),
              ]
            )
          )
        )

Audio statistics

SpeechKit allows you to analyze conversations and utterances of specific speakers as well as calculate statistics for each speaker and the conversation as a whole. Analysis results include discrete audio characteristics and descriptive statistics for distributions of these values.

For each speaker in the conversation, you can get:

Speech rate and length
Duration of pauses
Count and size of utterances

For the whole conversation, you can get:

Duration of parallel speech and pauses
Interruption count and timestamps

To enable calculation of statistics, define the speech_analysis parameter in the session settings.

recognize_options = stt_pb2.StreamingOptions(
                recognition_model=stt_pb2.RecognitionModelOptions(
                    ..
                    speech_analysis = stt_pb2.SpeechAnalysisOptions(
                        enable_speaker_analysis = True,
                        enable_conversation_analysis = True,
                        descriptive_statistics_quantiles = [0.5, 0.9]
                    ),
                    ...
                )

You will receive the analysis results in the speaker_analysis and conversation_analysis messages.

Was the article helpful?

Recognition result normalization

Speaker labeling

Analyzing recognition results

Audio classifiersAudio classifiers

Audio statisticsAudio statistics

Was the article helpful?

Audio classifiers

Audio statistics