Supported audio formats
SpeechKit allows you to recognize and synthesize the following audio formats:
- LPCM
- OggOpus
- MP3
LPCM
Linear pulse-code modulation without a WAV header.
Audio features in this format:
-
Sampling frequency:
API version Acceptable values Speech synthesis API v1 8, 16, or 48 kHz Speech synthesis API v3 Any value between 8 and 48 kHz Speech recognition API v2 8, 16, or 48 kHz Speech recognition API v3 8, 16, or 48 kHz -
Bit depth: 16 bit.
-
Byte order: Reversed (little-endian).
-
Audio data is stored as signed integers.
OggOpus
For OggOpus, data is encoded using the OPUS audio codec and compressed using the OGG container format.
SpeechKit recognizes and synthesizes OggOpus without audio file quality and header restrictions.
MP3
For MP3, data is encoded using the MPEG-1/2/2.5 Layer III audio codec and packaged in an MP3 container.
SpeechKit recognizes MP3 without audio file quality and header restrictions.
Warning
The MP3 format is not supported in the API v1 for synchronous recognition and API v2 for streaming recognition.