SpeechKit pricing policy

Note

Currency of Service rates (prices) depends on the company you made a contract with:

  • Prices in US dollars are applicable to customers of Iron Hive doo Beograd (Serbia) or Direct Cursus Technology L.L.C. (Dubai).
  • Prices in Russian roubles are applicable to customers of Yandex.Cloud LLC.

All prices in RUB and KZT are inclusive of VAT; in USD, net of VAT.

What goes into the cost of using speech synthesis

The cost of using SpeechKit for speech synthesis depends on the version of the API you use.

API v1

For the API v1, the cost is calculated based on the total number of characters sent to generate speech from text in a calendar month (reporting period).

API v3

The cost of using the API v3 depends on the number of sent synthesis requests. The cost is calculated for a calendar month (reporting period).

By default, speech synthesis requests have these limitations: 250 characters and 24 seconds. To synthesize longer phrases, you can use unsafe_mode or streaming mode. In this case, you will be charged per 250 characters, e.g.:

  • A request that is shorter than 250 characters is charged for as a single billing unit.
  • A request that is from 250 to 500 characters long is charged for as two billing units.
  • A request that is from 500 to 750 characters long is charged for as three billing units.

Empty request

The number of characters in a request is determined considering spaces and special characters. The cost of an empty request depends on the API version:

  • An empty request to the API v1 is charged for as a single character.
  • An empty request to the API v3 is charged for as a single billing unit.

Internal server errors

You are not charged for a request that fails due to an internal server error.

Speech synthesis pricing

Service Price per billable unit, without VAT
Speech synthesis using API v1, for 1,000,000 characters $10.99999824
Speech synthesis using API v3, per request $0.0013327867

SpeechKit Brand Voice

Service Price per unit, without VAT
SpeechKit Brand Voice Call Center
Request $0.0013327867
SpeechKit Brand Voice Lite
One-time fee for creating one voice $74.999988
Hosting, first seven days 1 Free of charge
Hosting, one voice, per month $833.32773552
Hosting, second voice, per month $749.99988
Hosting, third voice, per month $666.66382776
Hosting, fourth voice, per month $583.32777552
Hosting, fifth voice, per month $499.99992
Hosting, sixth voice and more, per month $416.66386776
Request $0.0013327867
SpeechKit Brand Voice Premium
Hosting, per month Upon request
Request $0.0013327867

1 Hosting Brand Voice Lite is free of charge for the first seven days after creating the voice. This is to allow you to test it, assess its efficiency, and get the result approved. After this seven-day period ends, the prices above apply.

What goes into the cost of using speech recognition

The cost of using SpeechKit for speech recognition depends on the recognition type and duration of a recognized audio fragment. The cost is calculated for a calendar month (reporting period).

Streaming speech recognition

The cost of using SpeechKit streaming recognition is calculated based on the pricing rules for synchronous recognition.

Synchronous recognition

These rules apply to synchronous recognition and streaming mode recognition when using the API v2 and API v3.

The billing unit is a 15-second segment of a single-channel audio file. Shorter segments are rounded up (1 second becomes 15 seconds).

Warning

In streaming mode, the billing starts as soon as you send a message with recognition settings. If you do not send any audio after this message, it will be treated as one consumed billing unit.

Examples:

  • One audio fragment that is 37 seconds long is billed as 45 seconds.

    Explanation: The audio is divided into two 15-second segments and one 7-second segment. The length of the last segment is rounded up to 15 seconds. Thus, we have three segments, 15 seconds each.

  • Two audio fragments that are 5 and 8 seconds long are billed as 30 seconds.

    Explanation: The length of each audio is rounded up to 15 seconds. Thus, we have two segments, 15 seconds each.

Asynchronous recognition

These rules apply when using asynchronous recognition.

The billing unit is a one-second segment of two-channel audio. Shorter segments are rounded up. The number of channels is rounded up to an even number.

The minimum billable amount is 15 seconds for every two channels. Shorter audio fragments are billed as 15 seconds.

Examples of rounding the length of audio

Length Number of channels Seconds charged
1 second 1 15
1 second 2 15
1 second 3 30
15.5 seconds 2 16
15.5 seconds 4 32

Empty request

The cost of an empty request to any type of speech recognition is equal to that of a single billing unit.

Internal server errors

You are not charged for a request that fails due to an internal server error.

Speech recognition pricing

Service Price for 15 seconds of audio,
without VAT
Streaming recognition $0.0013327867
Synchronous file recognition $0.0013327867
Asynchronous file recognition* $0.0012418035
Asynchronous file recognition, deferred mode* $0.0003122955

* Per-second billing starts from the 16th second.

Cost calculation examples

Speech synthesis using API v1

The cost of using SpeechKit for speech synthesis using the API v1 with the following properties:

  • Number of characters sent per month: 2023.

The cost is calculated as follows:

2,023 × ($10.99999824 / 1,000,000) = $0.0222529964

Total: $0.0222529964.

Where:

  • $10.99999824: Cost per 1,000,000 characters.
  • $10.99999824 / 1,000,000: Cost per character.

Speech synthesis using API v3

The cost of using SpeechKit for speech synthesis using the API v3 with the following properties:

  • Number of sent requests: 3.
  • Number of characters in requests: 150, 300, 600.

The cost is calculated as follows:

(1 + 2 + 3) × $0.0013327867 = $0.0079967202

Total: $0.0079967202.

Where:

  • 1: Number of billing units charged for the first request of 150 characters.
  • 2: Number of billing units charged for the second request of 300 characters in unsafe_mode.
  • 3: Number of billing units charged for the third request of 600 characters in unsafe_mode.
  • $0.0013327867: Cost per billing unit.

Streaming speech recognition

The cost of using SpeechKit for streaming speech recognition with the following properties:

  • Number of audio fragments: 2.
  • Duration of audio fragments: 5 seconds, 37 seconds.

The cost is calculated as follows:

(1 + 3) × $0.0013327867 = $0.0053311468

Total: $0.0053311468.

Where:

  • 1: Number of billing units charged for the first 5-second audio fragment rounded up to 15 seconds.
  • 3: Number of billing units charged for the second 37-second audio fragment rounded up to 45 seconds.
  • $0.0013327867: Cost per billing unit.

Synchronous speech recognition

The cost of using SpeechKit for synchronous speech recognition with the following properties:

  • Number of audio fragments: 2.
  • Duration of audio fragments: 5 seconds, 37 seconds.

The cost is calculated as follows:

(1 + 3) × $0.0013327867 = $0.0053311468

Total: $0.0053311468.

Where:

  • 1: Number of billing units charged for the first 5-second audio fragment rounded up to 15 seconds.
  • 3: Number of billing units charged for the second 37-second audio fragment rounded up to 45 seconds.
  • $0.0013327867: Cost per billing unit.

Asynchronous speech recognition

The cost of using SpeechKit for asynchronous speech recognition with the following properties:

  • Number of audio fragments: 4.
  • Duration of audio fragments: 5 seconds, 5 seconds, 15.5 seconds, 15.5 seconds.
  • Number of channels in audio fragments: 1, 3, 2, 4.

The cost is calculated as follows:

(15 + 30 + 16 + 32) × $0.0000827869 = $0.0076991817

Total: $0.0076991817.

Where:

  • 15: Number of billing units charged for the first single-channel 5-second audio fragment rounded up to 2 channels and 15 seconds.
  • 30: Number of billing units charged for the second 3-channel 5-second audio fragment rounded up to 4 channels and 15 seconds.
  • 16: Number of billing units charged for the third 2-channel 15.5-second audio fragment rounded up to 16 seconds.
  • 32: Number of billing units charged for the fourth 4-channel 15.5-second audio fragment rounded up to 16 seconds.
  • $0.0000827869: Cost per billing unit.

Asynchronous speech recognition in deferred mode

The cost of using SpeechKit for asynchronous speech recognition in deferred mode with the following properties:

  • Number of audio fragments: 3.
  • Duration of audio fragments: 2 seconds, 14 seconds, 19.5 seconds.
  • Number of channels in audio fragments: 2, 3, 4.

The cost is calculated as follows:

(15 + 30 + 40) × $0.0000208197 = $0.0017696745

Total: $0.0017696745.

Where:

  • 15: Number of billing units charged for the first 2-channel 2-second audio fragment rounded up to 15 seconds.
  • 30: Number of billing units charged for the second 3-channel 14-second audio fragment rounded up to 4 channels and 15 seconds.
  • 40: Number of billing units charged for the third 4-channel 19.5-second audio fragment rounded up to 20 seconds.
  • $0.0000208197: Cost per billing unit.