Example use of streaming recognition with API v2

The example shows how you can recognize speech in LPCM format in real time using the SpeechKit API v2.

The example uses the following parameters:

To use the API, you need the grpcio-tools package for Python and grpc for Node.js.

The Yandex account or federated account are authenticated using an IAM token. If you are using your service account, you do not need to include the folder ID in the request header. Learn more about authentication in the SpeechKit API.

To try the examples in this section:

  1. Clone the Yandex Cloud API repository:

    git clone https://github.com/yandex-cloud/cloudapi
            
  2. Download a sample audio file for recognition.

  3. Create a client application:

    1. Use the pip package manager to install the grpcio-tools package:

      pip install grpcio-tools
              
    2. Go to the folder with the Yandex Cloud API repository, create a folder named output and generate the client interface code in it:

      cd cloudapi
              mkdir output
              python3 -m grpc_tools.protoc -I . -I third_party/googleapis \
                --python_out=output \
                --grpc_python_out=output \
                  google/api/http.proto \
                  google/api/annotations.proto \
                  yandex/cloud/api/operation.proto \
                  google/rpc/status.proto \
                  yandex/cloud/operation/operation.proto \
                  yandex/cloud/ai/stt/v2/stt_service.proto
              

      The action will create the stt_service_pb2.py and stt_service_pb2_grpc.py client interface files and dependency files in the output directory.

    3. Create a file (e.g., test.py) in the root of the output directory, and add the following code to it:

      #coding=utf8
              import argparse
              
              import grpc
              
              import yandex.cloud.ai.stt.v2.stt_service_pb2 as stt_service_pb2
              import yandex.cloud.ai.stt.v2.stt_service_pb2_grpc as stt_service_pb2_grpc
              
              CHUNK_SIZE = 4000
              
              def gen(folder_id, audio_file_name):
                  # Specify the recognition settings.
                  specification = stt_service_pb2.RecognitionSpec(
                      language_code='ru-RU',
                      profanity_filter=True,
                      model='general',
                      partial_results=True,
                      audio_encoding='LINEAR16_PCM',
                      sample_rate_hertz=8000
                  )
                  streaming_config = stt_service_pb2.RecognitionConfig(specification=specification, folder_id=folder_id)
              
                  # Send a message with recognition settings.
                  yield stt_service_pb2.StreamingRecognitionRequest(config=streaming_config)
              
                  # Read the audio file and send its contents in chunks.
                  with open(audio_file_name, 'rb') as f:
                      data = f.read(CHUNK_SIZE)
                      while data != b'':
                          yield stt_service_pb2.StreamingRecognitionRequest(audio_content=data)
                          data = f.read(CHUNK_SIZE)
              
              def run(folder_id, iam_token, audio_file_name):
                  # Establish a connection with the server.
                  cred = grpc.ssl_channel_credentials()
                  channel = grpc.secure_channel('stt.api.cloud.yandex.net:443', cred)
                  stub = stt_service_pb2_grpc.SttServiceStub(channel)
              
                  # Send data for recognition.
                  it = stub.StreamingRecognize(gen(folder_id, audio_file_name), metadata=(
                      ('authorization', 'Bearer %s' % iam_token),
                  ))
              
                  # Process the server responses and output the result to the console.
                  try:
                      for r in it:
                          try:
                              print('Start chunk: ')
                              for alternative in r.chunks[0].alternatives:
                                  print('alternative: ', alternative.text)
                              print('Is final: ', r.chunks[0].final)
                              print('')
                          except LookupError:
                              print('Not available chunks')
                  except grpc._channel._Rendezvous as err:
                      print('Error code %s, message: %s' % (err._state.code, err._state.details))
              
              if __name__ == '__main__':
                  parser = argparse.ArgumentParser()
                  parser.add_argument('--token', required=True, help='IAM token')
                  parser.add_argument('--folder_id', required=True, help='folder ID')
                  parser.add_argument('--path', required=True, help='audio file path')
                  args = parser.parse_args()
              
                  run(args.folder_id, args.token, args.path)
              

      Where:

    4. Set the folder ID:

      export FOLDER_ID=<folder_ID>
              
    5. Set the IAM token:

      export IAM_TOKEN=<IAM_token>
              
    6. Run the file you created:

      python3 test.py --token ${IAM_TOKEN} --folder_id ${FOLDER_ID} --path speech.pcm
              

      Where --path is the path to the audio file for recognition.

      Result:

      Start chunk:
              alternative: Hello
              Is final: False
              
              Start chunk:
              alternative: Hello world
              Is final: True
              
    1. Go to the folder with the Yandex Cloud API repository, create a folder named src, and generate a dependency file named package.json in it:

      cd cloudapi
              mkdir src
              cd src
              npm init
              
    2. Install the necessary packages using npm:

      npm install grpc @grpc/proto-loader google-proto-files --save
              
    3. Download a gRPC public certificate from the official repository and save it in the root of the src folder.

    4. Create a file (e.g., index.js) in the root of the src directory, and add the following code to it:

      const fs = require('fs');
              const grpc = require('grpc');
              const protoLoader = require('@grpc/proto-loader');
              const CHUNK_SIZE = 4000;
              
              // Get the folder ID and IAM token from the environment variables.
              const folderId = process.env.FOLDER_ID;
              const iamToken = process.env.IAM_TOKEN;
              
              // Read the file specified in the arguments.
              const audio = fs.readFileSync(process.argv[2]);
              
              // Set the recognition settings.
              const request = {
                  config: {
                      specification: {
                          languageCode: 'ru-RU',
                          profanityFilter: true,
                          model: 'general',
                          partialResults: true,
                          audioEncoding: 'LINEAR16_PCM',
                          sampleRateHertz: '8000'
                      },
                      folderId: folderId
                  }
              };
              
              // // Set audio send frequency in milliseconds.
              // For LPCM, you can calculate the frequency using this formula: CHUNK_SIZE * 1000 / ( 2 * sampleRateHertz);
              const FREQUENCY = 250;
              
              const serviceMetadata = new grpc.Metadata();
              serviceMetadata.add('authorization', `Bearer ${iamToken}`);
              
              const packageDefinition = protoLoader.loadSync('../yandex/cloud/ai/stt/v2/stt_service.proto', {
                  includeDirs: ['node_modules/google-proto-files', '..']
              });
              const packageObject = grpc.loadPackageDefinition(packageDefinition);
              
              // Establish a connection with the server.
              const serviceConstructor = packageObject.yandex.cloud.ai.stt.v2.SttService;
              const grpcCredentials = grpc.credentials.createSsl(fs.readFileSync('./roots.pem'));
              const service = new serviceConstructor('stt.api.cloud.yandex.net:443', grpcCredentials);
              const call = service['StreamingRecognize'](serviceMetadata);
              
              // Send a message with recognition settings.
              call.write(request);
              
              // Read the audio file and send its contents in chunks.
              let i = 1;
              const interval = setInterval(() => {
                  if (i * CHUNK_SIZE <= audio.length) {
                      const chunk = new Uint16Array(audio.slice((i - 1) * CHUNK_SIZE, i * CHUNK_SIZE));
                      const chunkBuffer = Buffer.from(chunk);
                      call.write({audioContent: chunkBuffer});
                      i++;
                  } else {
                      call.end();
                      clearInterval(interval);
                  }
              }, FREQUENCY);
              
              // Process the server responses and output the result to the console.
              call.on('data', (response) => {
                  console.log('Start chunk: ');
                  response.chunks[0].alternatives.forEach((alternative) => {
                      console.log('alternative: ', alternative.text)
                  });
                  console.log('Is final: ', Boolean(response.chunks[0].final));
                  console.log('');
              });
              
              call.on('error', (response) => {
                  // Output errors to the console.
                  console.log(response);
              });
              

      Where:

    5. Set the folder ID:

      export FOLDER_ID=<folder_ID>
              
    6. Set the IAM token:

      export IAM_TOKEN=<IAM_token>
              
    7. Run the file you created:

      node index.js speech.pcm
              

      Where speech.pcm is the name of the audio file for recognition.

      Result:

      Start chunk:
              alternative: Hello world
              Is final: true
              

See also