Speech synthesis in the API v3

With the SpeechKit API v3, you can synthesize speech from text in TTS markup to a WAV file.

The example uses the following synthesis parameters:

Synthesized audio file format: LPCM with a sample rate of 22050 Hz, WAV container (default).
Volume normalization: LUFS (default).

To convert and record the result, you will need the grpcio-tools and pydub packages and the FFmpeg utility.

Authentication is performed under a service account using an API key or IAM token. To learn more about SpeechKit API authentication, see Authentication with the SpeechKit API.

To implement an example:

Create a service account to work with the SpeechKit API.
Assign the service account the ai.speechkit-tts.user role or higher for the folder where it was created.
Get an API key or IAM token for your service account.

Create a client application:

Python 3

Java

Clone the Yandex Cloud API repository:

git clone https://github.com/yandex-cloud/cloudapi

Install the grpcio-tools and pydub packages using the pip package manager:
```
pip install grpcio-tools && \
        pip install pydub
        
```
You need the grpcio-tools package to generate the interface code for the API v3 synthesis client. You need the pydub package to process the resulting audio files.
Download FFmpeg for correct operation of the pydub package. Add the path to the directory with the executable to the PATH variable. To do this, run this command:
```
export PATH=$PATH:<path_to_directory_with_FFmpeg_executable>
        
```

Go to the directory hosting the cloned Yandex Cloud API repository, create a directory named output, and generate the client interface code there:

cd <path_to_cloudapi_directory>
        mkdir output
        python3 -m grpc_tools.protoc -I . -I third_party/googleapis \
          --python_out=output \
          --grpc_python_out=output \
          google/api/http.proto \
          google/api/annotations.proto \
          yandex/cloud/api/operation.proto \
          google/rpc/status.proto \
          yandex/cloud/operation/operation.proto \
          yandex/cloud/validation.proto \
          yandex/cloud/ai/tts/v3/tts_service.proto \
          yandex/cloud/ai/tts/v3/tts.proto

This will create the tts_pb2.py, tts_pb2_grpc.py, tts_service_pb2.py, and tts_service_pb2_grpc.py client interface files, as well as dependency files, in the output directory.

Create a file (e.g., test.py) in the root of the output directory, and add the following code to it:

import io
        import grpc
        import pydub
        import argparse
        
        import yandex.cloud.ai.tts.v3.tts_pb2 as tts_pb2
        import yandex.cloud.ai.tts.v3.tts_service_pb2_grpc as tts_service_pb2_grpc
        
        # Specify the synthesis settings.
        # Provide api_key instead of iam_token when authenticating with an API key
        #def synthesize(api_key, text) -> pydub.AudioSegment:
        def synthesize(iam_token, text) -> pydub.AudioSegment:
            request = tts_pb2.UtteranceSynthesisRequest(
                text=text,
                output_audio_spec=tts_pb2.AudioFormatOptions(
                    container_audio=tts_pb2.ContainerAudio(
                        container_audio_type=tts_pb2.ContainerAudio.WAV
                    )
                ),
                # Synthesis parameters
                hints=[
                  tts_pb2.Hints(voice= 'alexander'), # (Optional) Specify the voice. The default value is `marina`
                  tts_pb2.Hints(role = 'good'), # (Optional) Specify the role only if applicable for this voice
                  tts_pb2.Hints(speed=1.1), # (Optional) Specify synthesis speed
                ],
        
                loudness_normalization_type=tts_pb2.UtteranceSynthesisRequest.LUFS
            )
        
            # Establish a connection with the server.
            cred = grpc.ssl_channel_credentials()
            channel = grpc.secure_channel('tts.api.cloud.yandex.net:443', cred)
            stub = tts_service_pb2_grpc.SynthesizerStub(channel)
        
            # Send data for synthesis.
            it = stub.UtteranceSynthesis(request, metadata=(
        
            # Parameters for authentication with an IAM token
                ('authorization', f'Bearer {iam_token}'),
            # Parameters for authentication with an API key as a service account
            #   ('authorization', f'Api-Key {api_key}'),
            ))
        
            # Create an audio file out of chunks.
            try:
                audio = io.BytesIO()
                for response in it:
                    audio.write(response.audio_chunk.data)
                audio.seek(0)
                return pydub.AudioSegment.from_wav(audio)
            except grpc._channel._Rendezvous as err:
                print(f'Error code {err._state.code}, message: {err._state.details}')
                raise err
        
        
        if __name__ == '__main__':
            parser = argparse.ArgumentParser()
            parser.add_argument('--token', required=True, help='IAM token or API key')
            parser.add_argument('--text', required=True, help='Text for synthesis')
            parser.add_argument('--output', required=True, help='Output file')
            args = parser.parse_args()
        
            audio = synthesize(args.token, args.text)
            with open(args.output, 'wb') as fp:
                audio.export(fp, format='wav')

Execute the file from the previous step:
```
export IAM_TOKEN=<service_account_IAM_token>
        export TEXT='I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!'
        python3 output/test.py \
          --token ${IAM_TOKEN} \
          --output speech.wav \
          --text ${TEXT}
        
```
Where:
- IAM_TOKEN: Service account IAM token. If you use an API key for authentication under a service account, change the Python script and the program call.
- TEXT: Text for synthesis in TTS markup.
- --output: Name of the file for the audio.
As a result, a file named speech.wav with synthesized speech will be created in the cloudapi directory.

Install the dependencies:

sudo apt update && sudo apt install --yes default-jdk maven

Clone the repository with a Java application configuration:

git clone https://github.com/yandex-cloud-examples/yc-speechkit-tts-java

Go to the repository directory:
```
cd yc-speechkit-tts-java
        
```
Compile a project in this directory:
```
mvn clean install
        
```
Go to the target directory you created:
```
cd target
        
```

Specify the service account's API key and text to synthesize:

export API_KEY=<API_key> && \
        export TEXT='I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!'

Run the Java script for speech synthesis:
```
java -cp speechkit_examples-1.0-SNAPSHOT.jar yandex.cloud.speechkit.examples.TtsV3Client ${TEXT}
        
```
As a result, the result.wav audio file should appear in the target directory. It contains speech recorded from the TEXT environment variable.

Was the article helpful?

Asynchronous recognition of OggOpus format, API v2

Speech synthesis in REST API v3

Speech synthesis in the API v3

See alsoSee also

Was the article helpful?

See also