Yandex Cloud AI Studio pricing policy

Note

Currency of Service rates (prices) depends on the company you made a contract with:

Prices in US dollars are applicable to customers of Iron Hive doo Beograd (Serbia) or Direct Cursus Technology L.L.C. (Dubai).
Prices in Russian roubles are applicable to customers of Yandex.Cloud LLC.

All prices in RUB and KZT are inclusive of VAT; in USD, net of VAT.

Model Gallery

The cost of using the models depends on the operating mode and the number of tokens for different consumption types:

Input query tokens.
Output model response tokens.
Cached tokens, if certain information is re-used without additional computation, such as instructions for a model.
Tool tokens provided to the model as a result of invoking any tool.

Caching is enabled automatically where possible and applicable. Caching is not guaranteed and does not apply to output tokens.

Tool tokens include all uncached tokens stored in the message history at the time the tool's results were transmitted. Tool tokens are calculated only for AI Studio built-in tools and do not apply to the results of custom functions. Use of tools is charged separately.

Synchronous mode

Model	Price per 1,000 input tokens, without VAT	Price per 1,000 cached tokens, without VAT	Price per 1,000 tool tokens, without VAT	Price per 1,000 output tokens, without VAT
Alice AI LLM	$0.00409836	$0.00409836	$0.0010655736	$0.009836064
YandexGPT Pro 5.1	$0.006557376	$0.006557376	$0.001639344	$0.006557376
YandexGPT Pro 5	$0.009836064	$0.009836064	$0.009836064	$0.009836064
YandexGPT Lite	$0.001639344	$0.001639344	$0.001639344	$0.001639344
Alice AI LLM Flash	$0.000819672	$0.000204918	$0.000204918	$0.001639344
DeepSeek V3.2	$0.00409836	$0.0010655736	$0.0010655736	$0.006557376
Qwen3 235B	$0.00409836	$0.00409836	$0.00409836	$0.00409836
gpt-oss-120b	$0.002459016	$0.002459016	$0.002459016	$0.002459016
gpt-oss-20b	$0.000819672	$0.000819672	$0.000819672	$0.000819672
Qwen3.6 35B	$0.001639344	$0.000409836	$0.000409836	$0.002459016
Qwen3.5 35B	$0.001639344	$0.000409836	$0.000409836	$0.002459016
speech-realtime-250923	$0.006557376	$0.001639344	$0.001639344	$0.006557376

Asynchronous mode

Model	Price per 1,000 input tokens, without VAT	Price per 1,000 output tokens, without VAT
Alice AI LLM	$0.00204918	$0.0083606544
YandexGPT Pro 5.1	$0.0033606552	$0.0033606552
YandexGPT Pro 5	$0.0049999992	$0.0049999992
YandexGPT Lite	$0.000819672	$0.000819672

Batch mode

With models in batch mode, the minimum cost per run is 200,000 tokens.

Model	Price per 1,000 input tokens, without VAT	Price per 1,000 output tokens, without VAT
Qwen2.5 7B Instruct	$0.000819672	$0.000819672
Qwen2.5 72B Instruct	$0.0049999992	$0.0049999992
QwQ 32B Instruct	$0.0033606552	$0.0033606552
Llama-3.3-70B-Instruct	$0.0049999992	$0.0049999992
Llama-3.1-70B-Instruct	$0.0049999992	$0.0049999992
DeepSeek-R1-Distill-Llama-70B	$0.0049999992	$0.0049999992
Qwen2.5 32B Instruct	$0.0033606552	$0.0033606552
DeepSeek-R1-Distill-Qwen-32B	$0.0033606552	$0.0033606552
phi-4	$0.001639344	$0.001639344
Qwen2 VL 7B	$0.000819672	$0.000819672
Qwen2.5 VL 7B	$0.000819672	$0.000819672
DeepSeek 2 VL	$0.0033606552	$0.0033606552
DeepSeek 2 VL Tiny	$0.000819672	$0.000819672
Gemma3 1B it	$0.000819672	$0.000819672
Gemma3 4B it	$0.000819672	$0.000819672
Gemma3 12B it	$0.001639344	$0.001639344
Gemma3 27B it	$0.0033606552	$0.0033606552
Qwen 2.5 VL 32B Instruct	$0.0033606552	$0.0033606552
Qwen3-0.6B	$0.000819672	$0.000819672
Qwen3-1.7B	$0.000819672	$0.000819672
Qwen3-4B	$0.000819672	$0.000819672
Qwen3-8B	$0.000819672	$0.000819672
Qwen3-14B	$0.001639344	$0.001639344
Qwen3-32B	$0.0033606552	$0.0033606552
Qwen3-30B-A3B	$0.0033606552	$0.0033606552
Qwen3-235B-A22B	$0.049999992	$0.049999992

Dedicated instances

The cost of operation of a dedicated instance depends on the model and selected configuration. Dedicated instances are charged per second with rounding up to a billing unit. However, there is no charge for hardware maintenance and model deployment time.

Prices are shown for 1 hour of use. Billing occurs per second.

The price per 1 unit for a dedicated instance is $0.0083327856 without VAT.

Model	Price per 1 hour, S configuration, without VAT	Price per 1 hour, M configuration, without VAT	Price per 1 hour L configuration, without VAT
Qwen 2.5 VL 32B Instruct	$6.70	$13.40	$20.10
Qwen 2.5 7B Instruct	$6.70	$13.40	$20.10
Gemma 3 4B it	$3.35	$6.70	$10.05
Gemma 3 12B it	$3.35	$6.70	$10.05
T-pro-it-2.0-FP8	$6.20	$12.40	$18.60

Model fine-tuning

At the Preview stage, you can fine-tune models free of charge. A fine-tuned YandexGPT Lite model will cost the same as the basic YandexGPT Lite model.

Text tokenization

The use of tokenizer (TokenizerService calls and Tokenizer methods) is free of charge.

Text vectorization

The cost of text vectorization (getting text embeddings) depends on the size of the text submitted for vectorization. Yandex Cloud Billing breaks down the creation of embeddings in vectorization units. One unit equals one token.

Model	Price per 1,000 tokens, without VAT
Embeddings	$0.0000827869

Example of cost calculation for text vectorization

The cost of vectorizing a text of 2,000 tokens will be:

$0.0000827869: Cost of processing 1,000 tokens.
$0.0000827869 / 1,000: Cost of processing one token.

2,000 × ($0.0000827869 / 1,000) = $0.0001655738

Total: $0.0001655738.

Text classifications

The cost of text classification depends on the classification model you use and the number of tokens you provide.

When classifying with YandexGPT Lite, a billing unit is a request of up to 1,000 tokens.
When classifying with YandexGPT Pro and fine-tuned classifiers, a billing unit is a request of up to 250 tokens.

Requests with less than one billing unit are rounded up to the next integer. Large texts are billed as multiple requests with rounding up.

For example, classifying a text of 770 tokens with YandexGPT Lite will be billed as a single request, i.e., as one billing unit.
The same 770-token text classified with YandexGPT Pro or a fine-tuned classifier will be billed as four requests.

Service	Price, without VAT
1 request (1,000 tokens) to classifier based on YandexGPT Lite	$0.0012499998
1 request (250 tokens) to classifier based on YandexGPT Pro	$0.0012499998
1 request (250 tokens) to tuned classifier	$0.0012499998

Image generation

You are charged for each generation request in YandexART. Requests are not idempotent; therefore, two requests with the same settings and generation prompt are considered as two separate requests.

Service	Price, without VAT
1 request for image generation	$0.0182786856

Agent Atelier

Voice agents

The cost of using voice agents consists of the following:

Cost of speech recognition (incoming audio).
Cost of speech synthesis (outgoing audio).
Cost of text generation using the speech-realtime-250923 model.
Cost of tool invocation.

Service	Price per unit of tariffing, without VAT
Incoming audio, per 1 second	$0.0002163934
Outgoing audio, per 1 second	$0.0001663934

Text-based agents

The cost of using text-based agents consists of the following:

Consumption of tokens as per the pricing plans of the Model Gallery models.
Cost of tool invocation.

Invoking tools in agents

Service	Price per 1,000 requests, without VAT
Web Search tool	$7.4999988
File Search tool	$2.459016
Code Interpreter tool	Free of charge
MCP tool	Free
Image Generation tool	$18.2786856

AI Search

The search index size is rounded up to the nearest whole gigabyte.

Service	Price per day per 1 GB, without VAT
Search index storage	$0.086885232
AI Studio file storage	Free of charge

MCP Hub

Storing MCP servers is free of charge. However, you may still be charged for tools created in MCP servers, such as Yandex Cloud Functions invocations.

When using external APIs, such as Kontur.Focus or amoCRM, you are charged directly by our respective partner.

Internal server errors

You are not charged for a request that fails due to an internal server error.

Was the article helpful?

Translate

SpeechKit

Yandex Cloud AI Studio pricing policy

Model GalleryModel Gallery

Synchronous modeSynchronous mode

Asynchronous modeAsynchronous mode

Batch modeBatch mode

Dedicated instancesDedicated instances

Model fine-tuningModel fine-tuning

Text tokenizationText tokenization

Text vectorizationText vectorization

Text classificationsText classifications

Image generationImage generation

Agent AtelierAgent Atelier

Voice agentsVoice agents

Text-based agentsText-based agents

Invoking tools in agentsInvoking tools in agents

AI SearchAI Search

MCP HubMCP Hub

Internal server errorsInternal server errors