Yandex Cloud AI Studio pricing policy
Note
Currency of Service rates (prices) depends on the company you made a contract with:
- Prices in US dollars are applicable to customers of Iron Hive doo Beograd (Serbia) or Direct Cursus Technology L.L.C. (Dubai).
- Prices in Russian roubles are applicable to customers of Yandex.Cloud LLC.
All prices in RUB and KZT are inclusive of VAT; in USD, net of VAT.
Model Gallery
The cost of using the models depends on the operating mode and the number of tokens for different consumption types:
- Input query tokens.
- Output model response tokens.
- Cached tokens, if certain information is re-used without additional computation, such as instructions for a model.
- Tool tokens provided to the model as a result of invoking any tool.
Caching is enabled automatically where possible and applicable. Caching is not guaranteed and does not apply to output tokens.
Tool tokens include all uncached tokens stored in the message history at the time the tool's results were transmitted. Tool tokens are calculated only for AI Studio built-in tools and do not apply to the results of custom functions. Use of tools is charged separately.
Synchronous mode
|
Model |
Price per 1,000 input tokens, |
Price per 1,000 cached tokens, |
Price per 1,000 tool tokens, |
Price per 1,000 output tokens, |
|
Alice AI LLM |
$0.00409836 |
$0.00409836 |
$0.0010655736 |
$0.009836064 |
|
YandexGPT Pro 5.1 |
$0.006557376 |
$0.006557376 |
$0.001639344 |
$0.006557376 |
|
YandexGPT Pro 5 |
$0.009836064 |
$0.009836064 |
$0.009836064 |
$0.009836064 |
|
YandexGPT Lite |
$0.001639344 |
$0.001639344 |
$0.001639344 |
$0.001639344 |
|
Alice AI LLM Flash |
$0.000819672 |
$0.000204918 |
$0.000204918 |
$0.001639344 |
|
DeepSeek V3.2 |
$0.00409836 |
$0.0010655736 |
$0.0010655736 |
$0.006557376 |
|
Qwen3 235B |
$0.00409836 |
$0.00409836 |
$0.00409836 |
$0.00409836 |
|
gpt-oss-120b |
$0.002459016 |
$0.002459016 |
$0.002459016 |
$0.002459016 |
|
gpt-oss-20b |
$0.000819672 |
$0.000819672 |
$0.000819672 |
$0.000819672 |
|
Qwen3.6 35B |
$0.001639344 |
$0.000409836 |
$0.000409836 |
$0.002459016 |
|
Qwen3.5 35B |
$0.001639344 |
$0.000409836 |
$0.000409836 |
$0.002459016 |
|
speech-realtime-250923 |
$0.006557376 |
$0.001639344 |
$0.001639344 |
$0.006557376 |
Asynchronous mode
|
Model |
Price per 1,000 input tokens, |
Price per 1,000 output tokens, |
|
Alice AI LLM |
$0.00204918 |
$0.0083606544 |
|
YandexGPT Pro 5.1 |
$0.0033606552 |
$0.0033606552 |
|
YandexGPT Pro 5 |
$0.0049999992 |
$0.0049999992 |
|
YandexGPT Lite |
$0.000819672 |
$0.000819672 |
Batch mode
With models in batch mode, the minimum cost per run is 200,000 tokens.
|
Model |
Price per 1,000 input tokens, |
Price per 1,000 output tokens, |
|
Qwen2.5 7B Instruct |
$0.000819672 |
$0.000819672 |
|
Qwen2.5 72B Instruct |
$0.0049999992 |
$0.0049999992 |
|
QwQ 32B Instruct |
$0.0033606552 |
$0.0033606552 |
|
Llama-3.3-70B-Instruct |
$0.0049999992 |
$0.0049999992 |
|
Llama-3.1-70B-Instruct |
$0.0049999992 |
$0.0049999992 |
|
DeepSeek-R1-Distill-Llama-70B |
$0.0049999992 |
$0.0049999992 |
|
Qwen2.5 32B Instruct |
$0.0033606552 |
$0.0033606552 |
|
DeepSeek-R1-Distill-Qwen-32B |
$0.0033606552 |
$0.0033606552 |
|
phi-4 |
$0.001639344 |
$0.001639344 |
|
Qwen2 VL 7B |
$0.000819672 |
$0.000819672 |
|
Qwen2.5 VL 7B |
$0.000819672 |
$0.000819672 |
|
DeepSeek 2 VL |
$0.0033606552 |
$0.0033606552 |
|
DeepSeek 2 VL Tiny |
$0.000819672 |
$0.000819672 |
|
Gemma3 1B it |
$0.000819672 |
$0.000819672 |
|
Gemma3 4B it |
$0.000819672 |
$0.000819672 |
|
Gemma3 12B it |
$0.001639344 |
$0.001639344 |
|
Gemma3 27B it |
$0.0033606552 |
$0.0033606552 |
|
Qwen 2.5 VL 32B Instruct |
$0.0033606552 |
$0.0033606552 |
|
Qwen3-0.6B |
$0.000819672 |
$0.000819672 |
|
Qwen3-1.7B |
$0.000819672 |
$0.000819672 |
|
Qwen3-4B |
$0.000819672 |
$0.000819672 |
|
Qwen3-8B |
$0.000819672 |
$0.000819672 |
|
Qwen3-14B |
$0.001639344 |
$0.001639344 |
|
Qwen3-32B |
$0.0033606552 |
$0.0033606552 |
|
Qwen3-30B-A3B |
$0.0033606552 |
$0.0033606552 |
|
Qwen3-235B-A22B |
$0.049999992 |
$0.049999992 |
Dedicated instances
The cost of operation of a dedicated instance depends on the model and selected configuration. Dedicated instances are charged per second with rounding up to a billing unit. However, there is no charge for hardware maintenance and model deployment time.
Prices are shown for 1 hour of use. Billing occurs per second.
The price per 1 unit for a dedicated instance is $0.0083327856 without VAT.
| Model | Price per 1 hour, S configuration, without VAT |
Price per 1 hour, M configuration, without VAT |
Price per 1 hour L configuration, without VAT |
|---|---|---|---|
| Qwen 2.5 VL 32B Instruct | $6.70 | $13.40 | $20.10 |
| Qwen 2.5 7B Instruct | $6.70 | $13.40 | $20.10 |
| Gemma 3 4B it | $3.35 | $6.70 | $10.05 |
| Gemma 3 12B it | $3.35 | $6.70 | $10.05 |
| T-pro-it-2.0-FP8 | $6.20 | $12.40 | $18.60 |
Model fine-tuning
At the Preview stage, you can fine-tune models free of charge. A fine-tuned YandexGPT Lite model will cost the same as the basic YandexGPT Lite model.
Text tokenization
The use of tokenizer (TokenizerService calls and Tokenizer methods) is free of charge.
Text vectorization
The cost of text vectorization (getting text embeddings) depends on the size of the text submitted for vectorization. Yandex Cloud Billing breaks down the creation of embeddings in vectorization units. One unit equals one token.
| Model | Price per 1,000 tokens, without VAT |
|---|---|
| Embeddings | $0.0000827869 |
Example of cost calculation for text vectorization
The cost of vectorizing a text of 2,000 tokens will be:
- $0.0000827869: Cost of processing 1,000 tokens.
- $0.0000827869 / 1,000: Cost of processing one token.
2,000 × ($0.0000827869 / 1,000) = $0.0001655738
Total: $0.0001655738.
Text classifications
The cost of text classification depends on the classification model you use and the number of tokens you provide.
- When classifying with YandexGPT Lite, a billing unit is a request of up to 1,000 tokens.
- When classifying with YandexGPT Pro and fine-tuned classifiers, a billing unit is a request of up to 250 tokens.
Requests with less than one billing unit are rounded up to the next integer. Large texts are billed as multiple requests with rounding up.
For example, classifying a text of 770 tokens with YandexGPT Lite will be billed as a single request, i.e., as one billing unit.
The same 770-token text classified with YandexGPT Pro or a fine-tuned classifier will be billed as four requests.
| Service | Price, without VAT |
|---|---|
| 1 request (1,000 tokens) to classifier based on YandexGPT Lite | $0.0012499998 |
| 1 request (250 tokens) to classifier based on YandexGPT Pro | $0.0012499998 |
| 1 request (250 tokens) to tuned classifier | $0.0012499998 |
Image generation
You are charged for each generation request in YandexART. Requests are not idempotent; therefore, two requests with the same settings and generation prompt are considered as two separate requests.
| Service | Price, without VAT |
|---|---|
| 1 request for image generation | $0.0182786856 |
Agent Atelier
Voice agents
The cost of using voice agents consists of the following:
- Cost of speech recognition (incoming audio).
- Cost of speech synthesis (outgoing audio).
- Cost of text generation using the speech-realtime-250923 model.
- Cost of tool invocation.
| Service | Price per unit of tariffing, without VAT |
|---|---|
| Incoming audio, per 1 second | $0.0002163934 |
| Outgoing audio, per 1 second | $0.0001663934 |
Text-based agents
The cost of using text-based agents consists of the following:
- Consumption of tokens as per the pricing plans of the Model Gallery models.
- Cost of tool invocation.
Invoking tools in agents
| Service | Price per 1,000 requests, without VAT |
|---|---|
| Web Search tool | $7.4999988 |
| File Search tool | $2.459016 |
| Code Interpreter tool | Free of charge |
| MCP tool | Free |
| Image Generation tool | $18.2786856 |
AI Search
The search index size is rounded up to the nearest whole gigabyte.
| Service | Price per day per 1 GB, without VAT |
|---|---|
| Search index storage | $0.086885232 |
| AI Studio file storage | Free of charge |
MCP Hub
Storing MCP servers is free of charge. However, you may still be charged for tools created in MCP servers, such as Yandex Cloud Functions invocations.
When using external APIs, such as Kontur.Focus or amoCRM, you are charged directly by our respective partner.
Internal server errors
You are not charged for a request that fails due to an internal server error.