Yandex Cloud AI Studio pricing policy

Note

Currency of Service rates (prices) depends on the company you made a contract with:

  • Prices in US dollars are applicable to customers of Iron Hive doo Beograd (Serbia) or Direct Cursus Technology L.L.C. (Dubai).
  • Prices in Russian roubles are applicable to customers of Yandex.Cloud LLC.

All prices in RUB and KZT are inclusive of VAT; in USD, net of VAT.

The cost of using the models depends on the operating mode and the number of tokens for different consumption types:

  • Input query tokens.
  • Output model response tokens.
  • Cached tokens, if certain information is re-used without additional computation, such as instructions for a model.
  • Tool tokens provided to the model as a result of invoking any tool.

Caching is enabled automatically where possible and applicable. Caching is not guaranteed and does not apply to output tokens.

Tool tokens include all uncached tokens stored in the message history at the time the tool's results were transmitted. Tool tokens are calculated only for AI Studio built-in tools and do not apply to the results of custom functions. Use of tools is charged separately.

Synchronous mode

Model

Price per 1,000 input tokens,
without VAT

Price per 1,000 cached tokens,
without VAT

Price per 1,000 tool tokens,
without VAT

Price per 1,000 output tokens,
without VAT

Alice AI LLM

$0.00409836

$0.00409836

$0.0010655736

$0.009836064

YandexGPT Pro 5.1

$0.006557376

$0.006557376

$0.001639344

$0.006557376

YandexGPT Pro 5

$0.009836064

$0.009836064

$0.009836064

$0.009836064

YandexGPT Lite

$0.001639344

$0.001639344

$0.001639344

$0.001639344

Alice AI LLM Flash

$0.000819672

$0.000204918

$0.000204918

$0.001639344

DeepSeek V3.2

$0.00409836

$0.0010655736

$0.0010655736

$0.006557376

Qwen3 235B

$0.00409836

$0.00409836

$0.00409836

$0.00409836

gpt-oss-120b

$0.002459016

$0.002459016

$0.002459016

$0.002459016

gpt-oss-20b

$0.000819672

$0.000819672

$0.000819672

$0.000819672

Qwen3.6 35B

$0.001639344

$0.000409836

$0.000409836

$0.002459016

Qwen3.5 35B

$0.001639344

$0.000409836

$0.000409836

$0.002459016

speech-realtime-250923

$0.006557376

$0.001639344

$0.001639344

$0.006557376

Asynchronous mode

Model

Price per 1,000 input tokens,
without VAT

Price per 1,000 output tokens,
without VAT

Alice AI LLM

$0.00204918

$0.0083606544

YandexGPT Pro 5.1

$0.0033606552

$0.0033606552

YandexGPT Pro 5

$0.0049999992

$0.0049999992

YandexGPT Lite

$0.000819672

$0.000819672

Batch mode

With models in batch mode, the minimum cost per run is 200,000 tokens.

Model

Price per 1,000 input tokens,
without VAT

Price per 1,000 output tokens,
without VAT

Qwen2.5 7B Instruct

$0.000819672

$0.000819672

Qwen2.5 72B Instruct

$0.0049999992

$0.0049999992

QwQ 32B Instruct

$0.0033606552

$0.0033606552

Llama-3.3-70B-Instruct

$0.0049999992

$0.0049999992

Llama-3.1-70B-Instruct

$0.0049999992

$0.0049999992

DeepSeek-R1-Distill-Llama-70B

$0.0049999992

$0.0049999992

Qwen2.5 32B Instruct

$0.0033606552

$0.0033606552

DeepSeek-R1-Distill-Qwen-32B

$0.0033606552

$0.0033606552

phi-4

$0.001639344

$0.001639344

Qwen2 VL 7B

$0.000819672

$0.000819672

Qwen2.5 VL 7B

$0.000819672

$0.000819672

DeepSeek 2 VL

$0.0033606552

$0.0033606552

DeepSeek 2 VL Tiny

$0.000819672

$0.000819672

Gemma3 1B it

$0.000819672

$0.000819672

Gemma3 4B it

$0.000819672

$0.000819672

Gemma3 12B it

$0.001639344

$0.001639344

Gemma3 27B it

$0.0033606552

$0.0033606552

Qwen 2.5 VL 32B Instruct

$0.0033606552

$0.0033606552

Qwen3-0.6B

$0.000819672

$0.000819672

Qwen3-1.7B

$0.000819672

$0.000819672

Qwen3-4B

$0.000819672

$0.000819672

Qwen3-8B

$0.000819672

$0.000819672

Qwen3-14B

$0.001639344

$0.001639344

Qwen3-32B

$0.0033606552

$0.0033606552

Qwen3-30B-A3B

$0.0033606552

$0.0033606552

Qwen3-235B-A22B

$0.049999992

$0.049999992

Dedicated instances

The cost of operation of a dedicated instance depends on the model and selected configuration. Dedicated instances are charged per second with rounding up to a billing unit. However, there is no charge for hardware maintenance and model deployment time.

Prices are shown for 1 hour of use. Billing occurs per second.

The price per 1 unit for a dedicated instance is $0.0083327856 without VAT.

Model Price per 1 hour,
S configuration,
without VAT
Price per 1 hour,
M configuration,
without VAT
Price per 1 hour
L configuration,
without VAT
Qwen 2.5 VL 32B Instruct $6.70 $13.40 $20.10
Qwen 2.5 7B Instruct $6.70 $13.40 $20.10
Gemma 3 4B it $3.35 $6.70 $10.05
Gemma 3 12B it $3.35 $6.70 $10.05
T-pro-it-2.0-FP8 $6.20 $12.40 $18.60

Model fine-tuning

At the Preview stage, you can fine-tune models free of charge. A fine-tuned YandexGPT Lite model will cost the same as the basic YandexGPT Lite model.

Text tokenization

The use of tokenizer (TokenizerService calls and Tokenizer methods) is free of charge.

Text vectorization

The cost of text vectorization (getting text embeddings) depends on the size of the text submitted for vectorization. Yandex Cloud Billing breaks down the creation of embeddings in vectorization units. One unit equals one token.

Model Price per 1,000 tokens, without VAT
Embeddings $0.0000827869
Example of cost calculation for text vectorization

The cost of vectorizing a text of 2,000 tokens will be:

  • $0.0000827869: Cost of processing 1,000 tokens.
  • $0.0000827869 / 1,000: Cost of processing one token.

2,000 × ($0.0000827869 / 1,000) = $0.0001655738

Total: $0.0001655738.

Text classifications

The cost of text classification depends on the classification model you use and the number of tokens you provide.

  • When classifying with YandexGPT Lite, a billing unit is a request of up to 1,000 tokens.
  • When classifying with YandexGPT Pro and fine-tuned classifiers, a billing unit is a request of up to 250 tokens.

Requests with less than one billing unit are rounded up to the next integer. Large texts are billed as multiple requests with rounding up.

For example, classifying a text of 770 tokens with YandexGPT Lite will be billed as a single request, i.e., as one billing unit.
The same 770-token text classified with YandexGPT Pro or a fine-tuned classifier will be billed as four requests.

Service Price,
without VAT
1 request (1,000 tokens) to classifier based on YandexGPT Lite $0.0012499998
1 request (250 tokens) to classifier based on YandexGPT Pro $0.0012499998
1 request (250 tokens) to tuned classifier $0.0012499998

Image generation

You are charged for each generation request in YandexART. Requests are not idempotent; therefore, two requests with the same settings and generation prompt are considered as two separate requests.

Service Price,
without VAT
1 request for image generation $0.0182786856

Agent Atelier

Voice agents

The cost of using voice agents consists of the following:

Service Price per unit of tariffing,
without VAT
Incoming audio, per 1 second $0.0002163934
Outgoing audio, per 1 second $0.0001663934

Text-based agents

The cost of using text-based agents consists of the following:

Invoking tools in agents

Service Price per 1,000 requests,
without VAT
Web Search tool $7.4999988
File Search tool $2.459016
Code Interpreter tool Free of charge
MCP tool Free
Image Generation tool $18.2786856

The search index size is rounded up to the nearest whole gigabyte.

Service Price per day per 1 GB,
without VAT
Search index storage $0.086885232
AI Studio file storage Free of charge

MCP Hub

Storing MCP servers is free of charge. However, you may still be charged for tools created in MCP servers, such as Yandex Cloud Functions invocations.

When using external APIs, such as Kontur.Focus or amoCRM, you are charged directly by our respective partner.

Internal server errors

You are not charged for a request that fails due to an internal server error.

Previous