Text markup for speech synthesis

You can control pronunciation during speech synthesis by marking up the text you want to synthesize. Yandex SpeechKit fully supports markup for texts in Russian only. Some pronunciation control features are also supported for other languages.

Warning

When using pattern-based synthesis, the markup outside the variable part is ignored.

For Russian and Kazakh, Yandex SpeechKit supports the synthesis of normalized text:

  • Abbreviations do not need to be represented phonetically.
  • You can use Arabic numerals for numbers. During speech synthesis, they are converted into numbers pronounced as words.

Note

SpeechKit is designed for natural speech synthesis. Marking up data for speech synthesis helps set up pronunciation of separate words, phrases, and sentences. However, it is not intended for generating separate sounds and silence.

The markup in the text will serve as a cue for synthesis, not as a direct instruction.

In SpeechKit, there are two markup formats:

  • TTS: For API v1 and API v3.
  • SSML: For API v1 only.