Talk Analytics API, REST: Talk.Get

rpc for bulk get

HTTP request

POST https://rest-api.speechsense.yandexcloud.net/speechsense/v1/talks/get
        

Body parameters

{
          "organizationId": "string",
          "spaceId": "string",
          "connectionId": "string",
          "projectId": "string",
          "talkIds": [
            "string"
          ],
          "resultsMask": "string"
        }
        

Field

Description

organizationId

string

id of organization

spaceId

string

id of space

connectionId

string

id of connection to search data

projectId

string

id of project to search data

talkIds[]

string

ids of talks to return. Requesting too many talks may result in "message exceeds maximum size" error.
Up to 100 of talks per request is recommended.

resultsMask

string (field-mask)

A comma-separated names off ALL fields to be updated.
Only the specified fields will be changed. The others will be left untouched.
If the field is specified in updateMask and no value for that field was sent in the request,
the field's value will be reset to the default. The default value for most fields is null or 0.

If updateMask is not sent in the request, all fields' values will be updated.
Fields specified in the request will be updated to provided values.
The rest of the fields will be reset to the default.

Response

HTTP Code: 200 - OK

{
          "talk": [
            {
              "id": "string",
              "organizationId": "string",
              "spaceId": "string",
              "connectionId": "string",
              "projectIds": [
                "string"
              ],
              "createdBy": "string",
              "createdAt": "string",
              "modifiedBy": "string",
              "modifiedAt": "string",
              "talkFields": [
                {
                  "name": "string",
                  "value": "string",
                  "type": "string"
                }
              ],
              "transcription": {
                "phrases": [
                  {
                    "channelNumber": "string",
                    "startTimeMs": "string",
                    "endTimeMs": "string",
                    "phrase": {
                      "text": "string",
                      "language": "string",
                      "normalizedText": "string",
                      "words": [
                        {
                          "word": "string",
                          "startTimeMs": "string",
                          "endTimeMs": "string"
                        }
                      ]
                    },
                    "statistics": {
                      "statistics": {
                        "speakerTag": "string",
                        "speechBoundaries": {
                          "startTimeMs": "string",
                          "endTimeMs": "string",
                          "durationSeconds": "string"
                        },
                        "totalSpeechMs": "string",
                        "speechRatio": "string",
                        "totalSilenceMs": "string",
                        "silenceRatio": "string",
                        "wordsCount": "string",
                        "lettersCount": "string",
                        "wordsPerSecond": {
                          "min": "string",
                          "max": "string",
                          "mean": "string",
                          "std": "string",
                          "quantiles": [
                            {
                              "level": "string",
                              "value": "string"
                            }
                          ]
                        },
                        "lettersPerSecond": {
                          "min": "string",
                          "max": "string",
                          "mean": "string",
                          "std": "string",
                          "quantiles": [
                            {
                              "level": "string",
                              "value": "string"
                            }
                          ]
                        }
                      }
                    },
                    "classifiers": [
                      {
                        "startTimeMs": "string",
                        "endTimeMs": "string",
                        "classifier": "string",
                        "highlights": [
                          {
                            "text": "string",
                            "offset": "string",
                            "count": "string"
                          }
                        ],
                        "labels": [
                          {
                            "label": "string",
                            "confidence": "string"
                          }
                        ]
                      }
                    ]
                  }
                ],
                "algorithmsMetadata": [
                  {
                    "createdTaskDate": "string",
                    "completedTaskDate": "string",
                    "error": {
                      "code": "string",
                      "message": "string"
                    },
                    "traceId": "string",
                    "name": "string"
                  }
                ]
              },
              "speechStatistics": {
                "totalSimultaneousSpeechDurationSeconds": "string",
                "totalSimultaneousSpeechDurationMs": "string",
                "totalSimultaneousSpeechRatio": "string",
                "simultaneousSpeechDurationEstimation": {
                  "min": "string",
                  "max": "string",
                  "mean": "string",
                  "std": "string",
                  "quantiles": [
                    {
                      "level": "string",
                      "value": "string"
                    }
                  ]
                }
              },
              "silenceStatistics": {
                "totalSimultaneousSilenceDurationMs": "string",
                "totalSimultaneousSilenceRatio": "string",
                "simultaneousSilenceDurationEstimation": {
                  "min": "string",
                  "max": "string",
                  "mean": "string",
                  "std": "string",
                  "quantiles": [
                    {
                      "level": "string",
                      "value": "string"
                    }
                  ]
                },
                "totalSimultaneousSilenceDurationSeconds": "string"
              },
              "interruptsStatistics": {
                "speakerInterrupts": [
                  {
                    "speakerTag": "string",
                    "interruptsCount": "string",
                    "interruptsDurationMs": "string",
                    "interrupts": [
                      {
                        "startTimeMs": "string",
                        "endTimeMs": "string",
                        "durationSeconds": "string"
                      }
                    ],
                    "interruptsDurationSeconds": "string"
                  }
                ]
              },
              "conversationStatistics": {
                "conversationBoundaries": {
                  "startTimeMs": "string",
                  "endTimeMs": "string",
                  "durationSeconds": "string"
                },
                "speakerStatistics": [
                  {
                    "speakerTag": "string",
                    "completeStatistics": {
                      "speakerTag": "string",
                      "speechBoundaries": {
                        "startTimeMs": "string",
                        "endTimeMs": "string",
                        "durationSeconds": "string"
                      },
                      "totalSpeechMs": "string",
                      "speechRatio": "string",
                      "totalSilenceMs": "string",
                      "silenceRatio": "string",
                      "wordsCount": "string",
                      "lettersCount": "string",
                      "wordsPerSecond": {
                        "min": "string",
                        "max": "string",
                        "mean": "string",
                        "std": "string",
                        "quantiles": [
                          {
                            "level": "string",
                            "value": "string"
                          }
                        ]
                      },
                      "lettersPerSecond": {
                        "min": "string",
                        "max": "string",
                        "mean": "string",
                        "std": "string",
                        "quantiles": [
                          {
                            "level": "string",
                            "value": "string"
                          }
                        ]
                      }
                    },
                    "wordsPerUtterance": {
                      "min": "string",
                      "max": "string",
                      "mean": "string",
                      "std": "string",
                      "quantiles": [
                        {
                          "level": "string",
                          "value": "string"
                        }
                      ]
                    },
                    "lettersPerUtterance": {
                      "min": "string",
                      "max": "string",
                      "mean": "string",
                      "std": "string",
                      "quantiles": [
                        {
                          "level": "string",
                          "value": "string"
                        }
                      ]
                    },
                    "utteranceCount": "string",
                    "utteranceDurationEstimation": {
                      "min": "string",
                      "max": "string",
                      "mean": "string",
                      "std": "string",
                      "quantiles": [
                        {
                          "level": "string",
                          "value": "string"
                        }
                      ]
                    }
                  }
                ]
              },
              "points": {
                "quiz": [
                  {
                    "request": "string",
                    "response": "string",
                    "id": "string"
                  }
                ]
              },
              "textClassifiers": {
                "classificationResult": [
                  {
                    "classifier": "string",
                    "classifierStatistics": [
                      {
                        "channelNumber": "string",
                        "totalCount": "string",
                        "histograms": [
                          {
                            "countValues": [
                              "string"
                            ]
                          }
                        ]
                      }
                    ]
                  }
                ]
              },
              "summarization": {
                "statements": [
                  {
                    "field": {
                      "id": "string",
                      "name": "string",
                      "type": "string"
                    },
                    "response": [
                      "string"
                    ]
                  }
                ]
              },
              "assistants": {
                "assistantResults": [
                  {
                    "assistantId": "string",
                    "results": [
                      {
                        "fieldId": "string",
                        // Includes only one of the fields `stringResult`, `intResult`, `floatResult`
                        "stringResult": "string",
                        "intResult": "string",
                        "floatResult": "string"
                        // end of the list of possible fields
                      }
                    ]
                  }
                ]
              },
              "talkState": {
                "processingState": "string",
                "algorithmProcessingInfos": [
                  {
                    "algorithm": "string",
                    "processingState": "string"
                  }
                ]
              }
            }
          ]
        }
        

Field

Description

talk[]

Talk

Talk

Field

Description

id

string

talk id

organizationId

string

spaceId

string

connectionId

string

projectIds[]

string

createdBy

string

audition info

createdAt

string (date-time)

String in RFC3339 text format. The range of possible values is from
0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z, i.e. from 0 to 9 digits for fractions of a second.

To work with values in this field, use the APIs described in the
Protocol Buffers reference.
In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).

modifiedBy

string

modifiedAt

string (date-time)

String in RFC3339 text format. The range of possible values is from
0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z, i.e. from 0 to 9 digits for fractions of a second.

To work with values in this field, use the APIs described in the
Protocol Buffers reference.
In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).

talkFields[]

Field

key-value representation of talk fields with values

transcription

Transcription

various ml analysis results

speechStatistics

SpeechStatistics

silenceStatistics

SilenceStatistics

interruptsStatistics

InterruptsStatistics

conversationStatistics

ConversationStatistics

points

Points

textClassifiers

TextClassifiers

summarization

Summarization

assistants

Assistants

talkState

TalkState

Field

connection field value

Field

Description

name

string

name of the field

value

string

field value

type

enum (FieldType)

field type

  • FIELD_TYPE_UNSPECIFIED
  • FIELD_TYPE_STRING
  • FIELD_TYPE_NUMBER
  • FIELD_TYPE_DECIMAL
  • FIELD_TYPE_BOOLEAN
  • FIELD_TYPE_DATE
  • FIELD_TYPE_JSON

Transcription

Field

Description

phrases[]

Phrase

algorithmsMetadata[]

AlgorithmMetadata

Their might be several algorithms that work on talk transcription. For example: speechkit and translator
So there might be other fields here for tracing

Phrase

Field

Description

channelNumber

string (int64)

startTimeMs

string (int64)

endTimeMs

string (int64)

phrase

PhraseText

statistics

PhraseStatistics

classifiers[]

RecognitionClassifierResult

PhraseText

Field

Description

text

string

language

string

normalizedText

string

words[]

Word

Word

Field

Description

word

string

startTimeMs

string (int64)

endTimeMs

string (int64)

PhraseStatistics

Field

Description

statistics

UtteranceStatistics

UtteranceStatistics

Field

Description

speakerTag

string

speechBoundaries

AudioSegmentBoundaries

Audio segment boundaries

totalSpeechMs

string (int64)

Total speech duration

speechRatio

string

Speech ratio within audio segment

totalSilenceMs

string (int64)

Total silence duration

silenceRatio

string

Silence ratio within audio segment

wordsCount

string (int64)

Number of words in recognized speech

lettersCount

string (int64)

Number of letters in recognized speech

wordsPerSecond

DescriptiveStatistics

Descriptive statistics for words per second distribution

lettersPerSecond

DescriptiveStatistics

Descriptive statistics for letters per second distribution

AudioSegmentBoundaries

Field

Description

startTimeMs

string (int64)

Audio segment start time

endTimeMs

string (int64)

Audio segment end time

durationSeconds

string (int64)

Duration in seconds

DescriptiveStatistics

Field

Description

min

string

Minimum observed value

max

string

Maximum observed value

mean

string

Estimated mean of distribution

std

string

Estimated standard deviation of distribution

quantiles[]

Quantile

List of evaluated quantiles

Quantile

Field

Description

level

string

Quantile level in range (0, 1)

value

string

Quantile value

RecognitionClassifierResult

Field

Description

startTimeMs

string (int64)

Start time of the audio segment used for classification

endTimeMs

string (int64)

End time of the audio segment used for classification

classifier

string

Name of the triggered classifier

highlights[]

PhraseHighlight

List of highlights, i.e. parts of phrase that determine the result of the classification

labels[]

RecognitionClassifierLabel

Classifier predictions

PhraseHighlight

Field

Description

text

string

Text transcription of the highlighted audio segment

offset

string (int64)

offset in symbols from the beginning of whole phrase where highlight begins

count

string (int64)

count of symbols in highlighted text

RecognitionClassifierLabel

Field

Description

label

string

The label of the class predicted by the classifier

confidence

string

The prediction confidence

AlgorithmMetadata

Field

Description

createdTaskDate

string (date-time)

String in RFC3339 text format. The range of possible values is from
0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z, i.e. from 0 to 9 digits for fractions of a second.

To work with values in this field, use the APIs described in the
Protocol Buffers reference.
In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).

completedTaskDate

string (date-time)

String in RFC3339 text format. The range of possible values is from
0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z, i.e. from 0 to 9 digits for fractions of a second.

To work with values in this field, use the APIs described in the
Protocol Buffers reference.
In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).

error

Error

traceId

string

name

string

Error

Field

Description

code

string

message

string

SpeechStatistics

Field

Description

totalSimultaneousSpeechDurationSeconds

string (int64)

Total simultaneous speech duration in seconds

totalSimultaneousSpeechDurationMs

string (int64)

Total simultaneous speech duration in ms

totalSimultaneousSpeechRatio

string

Simultaneous speech ratio within audio segment

simultaneousSpeechDurationEstimation

DescriptiveStatistics

Descriptive statistics for simultaneous speech duration distribution

SilenceStatistics

Field

Description

totalSimultaneousSilenceDurationMs

string (int64)

totalSimultaneousSilenceRatio

string

Simultaneous silence ratio within audio segment

simultaneousSilenceDurationEstimation

DescriptiveStatistics

Descriptive statistics for simultaneous silence duration distribution

totalSimultaneousSilenceDurationSeconds

string (int64)

InterruptsStatistics

Field

Description

speakerInterrupts[]

InterruptsEvaluation

Interrupts description for every speaker

InterruptsEvaluation

Field

Description

speakerTag

string

Speaker tag

interruptsCount

string (int64)

Number of interrupts made by the speaker

interruptsDurationMs

string (int64)

Total duration of all interrupts

interrupts[]

AudioSegmentBoundaries

Boundaries for every interrupt

interruptsDurationSeconds

string (int64)

Total duration of all interrupts in seconds

ConversationStatistics

Field

Description

conversationBoundaries

AudioSegmentBoundaries

Audio segment boundaries

speakerStatistics[]

SpeakerStatistics

Average statistics for each speaker

SpeakerStatistics

Field

Description

speakerTag

string

Speaker tag

completeStatistics

UtteranceStatistics

analysis of all phrases in format of single utterance

wordsPerUtterance

DescriptiveStatistics

Descriptive statistics for words per utterance distribution

lettersPerUtterance

DescriptiveStatistics

Descriptive statistics for letters per utterance distribution

utteranceCount

string (int64)

Number of utterances

utteranceDurationEstimation

DescriptiveStatistics

Descriptive statistics for utterance duration distribution

Points

Field

Description

quiz[]

Quiz

Quiz

Field

Description

request

string

response

string

id

string

TextClassifiers

Field

Description

classificationResult[]

ClassificationResult

ClassificationResult

Field

Description

classifier

string

Classifier name

classifierStatistics[]

ClassifierStatistics

Classifier statistics

ClassifierStatistics

Field

Description

channelNumber

string (int64)

Channel number, null for whole talk

totalCount

string (int64)

classifier total count

histograms[]

Histogram

Represents various histograms build on top of classifiers

Histogram

Field

Description

countValues[]

string (int64)

histogram count values. For example:
if len(count_values) = 2, it means that histogram is 50/50,
if len(count_values) = 3 - [0] value represents first third, [1] - second third, [2] - last third, etc.

Summarization

Field

Description

statements[]

SummarizationStatement

SummarizationStatement

Field

Description

field

SummarizationField

response[]

string

SummarizationField

Field

Description

id

string

name

string

type

enum (SummarizationFieldType)

  • SUMMARIZATION_FIELD_TYPE_UNSPECIFIED
  • TEXT
  • TEXT_ARRAY

Assistants

Field

Description

assistantResults[]

AssistantResult

List of assistants results

AssistantResult

Field

Description

assistantId

string

Assistant id

results[]

AssistantFieldResult

Per-field assistant results

AssistantFieldResult

Field

Description

fieldId

string

Assistant result field id

stringResult

string

Result as a string

Includes only one of the fields stringResult, intResult, floatResult.

Parsed model answer for the field.
If the model answer could not be parsed, no result fields will be set.

intResult

string (int64)

Result as an integer

Includes only one of the fields stringResult, intResult, floatResult.

Parsed model answer for the field.
If the model answer could not be parsed, no result fields will be set.

floatResult

string

Result as a floating-point number

Includes only one of the fields stringResult, intResult, floatResult.

Parsed model answer for the field.
If the model answer could not be parsed, no result fields will be set.

TalkState

Field

Description

processingState

enum (ProcessingState)

  • PROCESSING_STATE_UNSPECIFIED
  • PROCESSING_STATE_NOT_STARTED
  • PROCESSING_STATE_PROCESSING
  • PROCESSING_STATE_SUCCESS
  • PROCESSING_STATE_FAILED

algorithmProcessingInfos[]

AlgorithmProcessingInfo

AlgorithmProcessingInfo

Field

Description

algorithm

enum (Algorithm)

  • ALGORITHM_UNSPECIFIED
  • ALGORITHM_SPEECHKIT
  • ALGORITHM_YGPT
  • ALGORITHM_CLASSIFIER
  • ALGORITHM_SUMMARIZATION
  • ALGORITHM_EMBEDDING
  • ALGORITHM_STATISTICS
  • ALGORITHM_ASSISTANT

processingState

enum (ProcessingState)

  • PROCESSING_STATE_UNSPECIFIED
  • PROCESSING_STATE_NOT_STARTED
  • PROCESSING_STATE_PROCESSING
  • PROCESSING_STATE_SUCCESS
  • PROCESSING_STATE_FAILED
Предыдущая