How to synthesize speech in the SpeechKit API v3

In this section, you will learn how to synthesize speech from text using the SpeechKit API v3 (gRPC).

Authentication for API access

To work with the SpeechKit API, you need to pass authentication. The authentication method depends on the account type:

  1. Get an IAM token for your Yandex account, federated account, or local account.
  2. Get the ID of the folder for which your account has the ai.speechkit-stt.user, ai.speechkit-tts.user, or higher roles.
  3. When accessing SpeechKit via the API, provide the received parameters in each request:

    • For API v1 and API v2:

      Specify the IAM token in the Authorization header as follows:

      Authorization: Bearer <IAM_token>
              

      Specify the folder ID in the request body in the folderId parameter.

    • For API v3:

      • Specify the IAM token in the Authorization header.
      • Specify the folder ID in the x-folder-id header.
      Authorization: Bearer <IAM_token> 
              x-folder-id: <folder_ID>
              

SpeechKit supports two authentication methods based on service accounts:

  • With an IAM token:

    1. Get an IAM token.

    2. Provide the IAM token in the Authorization header in the following format:

      Authorization: Bearer <IAM_token>
              
  • With API keys.

    Use API keys if requesting an IAM token automatically is not an option.

    1. Get an API key.

    2. Provide the API key in the Authorization header in the following format:

      Authorization: Api-Key <API_key>
              

Do not specify the folder ID in your requests, as Translate uses the folder where the service account was created.

In the example below, a Yandex account is used for authentication.

Getting started

  1. Install the grpcurl utility.

  2. Install the jq utility for piped processing of JSON files.

    sudo apt update && sudo apt install jq
            

Note

You can implement speech synthesis in the SpeechKit API v3 either using the mentioned utilities or other methods.

Convert text to an audio file

To synthesize speech from text in TTS markup to a WAV file:

  1. Create a file with the body of an API request and text to synthesize to speech:

    tts_req.json
    {
             "text": "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!",
             "outputAudioSpec": {
               "containerAudio": {
                 "containerAudioType": "WAV"
               }
             },
             "hints": [
                 {
                     "voice": "jane"
                 },
                 {
                     "role": "good"
                 }
             ],
             "loudnessNormalizationType": "LUFS"
            }
            
  2. Run the following commands:

    export FOLDER_ID=<folder_ID>
            export IAM_TOKEN=<IAM_token>
            jq . -c tts_req.json | \
            grpcurl -H "authorization: Bearer ${IAM_TOKEN}" \
                    -H "x-folder-id: ${FOLDER_ID}" \
                    -d @ tts.api.cloud.yandex.net:443 speechkit.tts.v3.Synthesizer/UtteranceSynthesis | \
            jq -r '.audioChunk.data' | base64 -d > speech.wav
            

    Where:

    • FOLDER_ID: Folder ID you got earlier.

      If you are using an IAM token of a service account, do not specify the folder ID in your request, as the service uses the folder the service account was created in.

    • IAM_TOKEN: IAM token you got earlier.

    • speech.wav: Output file.

As a result, a synthesized speech file named speech.wav will be created in the folder.

See also