Sending an asynchronous request

You can request proprietary Yandex models in asynchronous mode. In response to an asynchronous request, the model will return an operation object containing the operation ID you can use to follow up the operation's progress and get the result once the generation is complete. Use this mode if you do not need an urgent response, since asynchronous requests take longer to complete than synchronous ones.

Getting started

To use the examples of requests using SDK:

  1. Create a service account and assign it the ai.languageModels.user role.
  2. Get and save the service account's API key with yc.ai.foundationModels.execute for its scope.

    The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.

    Note

    If you are using Windows, we recommend installing the WSL shell first and using it to proceed.

  3. Install Python 3.10 or higher.

  4. Optionally, install Python venv to create isolated virtual environments in Python.

  5. Optionally, create a new Python virtual environment and activate it:

    python3 -m venv new-env
            source new-env/bin/activate
            
  6. Use the pip package manager to install the ML SDK library:

    pip install yandex-ai-studio-sdk
            

Get API authentication credentials as described in Authentication with the Yandex Cloud AI Studio API.

To use the examples, install cURL.

Send a request to the model

When using Yandex Cloud ML SDK, you can configure your code to wait for the operation to complete and return the response. To do this, use either the sleep function of the time module or the wait method. The example utilizes both of these methods one by one.

  1. Create a file named generate-deferred.py and paste the following code into it:

    #!/usr/bin/env python3
            
            from __future__ import annotations
            import time
            from yandex_ai_studio_sdk import AIStudio
            
            messages_1 = [
                {
                    "role": "system",
                    "text": "Find errors in the text and correct them",
                },
                {
                    "role": "user",
                    "text": """Laminate flooring is sutiable for instalation in the kitchen or in a child's
            room. It withsatnds moisturre and mechanical dammage thanks to 
            a 0.2 mm thick proctive layer of melamine films and 
            a wax-treated interlocking system.""",
                },
            ]
            
            messages_2 = [
                {"role": "system", "text": "Find errors in the text and correct them"},
                {"role": "user", "text": "Erors wyll not corrct themselfs."},
            ]
            
            
            def main():
            
                sdk = AIStudio(
                    folder_id="<folder_ID>",
                    auth="<API_key>",
                )
            
                model = sdk.models.completions("yandexgpt")
            
                # Variant 1: wait for the operation to complete using 5-second sleep periods
            
                print("Variant 1:")
            
                operation = model.configure(temperature=0.5).run_deferred(messages_1)
            
                status = operation.get_status()
                while status.is_running:
                    time.sleep(5)
                    status = operation.get_status()
            
                result = operation.get_result()
                print(result)
            
                # Variant 2: wait for the operation to complete using the wait method
            
                print("Variant 2:")
            
                operation = model.run_deferred(messages_2)
            
                result = operation.wait()
                print(result)
            
            
            if __name__ == "__main__":
                main()
            

    Where:

    Note

    As input data for a request, Yandex Cloud ML SDK can accept a string, a dictionary, an object of the TextMessage class, or an array containing any combination of these data types. For more information, see Yandex Cloud ML SDK usage.

    • messages_1 and messages_2: Arrays of messages providing the context for the model, each used for a different method of getting an asynchronous request result:

      • role: Message sender's role:

        • user: To send user messages to the model.
        • system: To set the request context and define the model's behavior.
        • assistant: For responses generated by the model. In chat mode, the model's responses tagged with the assistant role are included in the message to save the conversation context. Do not send user messages with this role.
    • text: Message text.

    For more information about accessing a specific model version, see {#T}.

  2. Run the file you created:

    python3 generate-deferred.py
            

    Result:

    Variant 1:
            GPTModelResult(alternatives=(Alternative(role='assistant', text='Laminate flooring is suitable for installation in the kitchen or in a child's room. It withstands moisture and mechanical damage thanks to a 0.2 mm thick protective layer of melamine films and a wax-treated interlocking system.', status=<AlternativeStatus.FINAL: 3>),), usage=Usage(input_text_tokens=74, completion_tokens=46, total_tokens=120), model_version='23.10.2024')
            Variant 2:
            GPTModelResult(alternatives=(Alternative(role='assistant', text='Errors will not correct themselves.\n\nErors → errors.', status=<AlternativeStatus.FINAL: 3>),), usage=Usage(input_text_tokens=32, completion_tokens=16, total_tokens=48), model_version='23.10.2024')
            

    The code waits for the result of the first method and then of the second one.

To use the examples, install cURL.

The example below is for MacOS and Linux. To run it on Windows, check the details on working with Bash in Microsoft Windows.

  1. Create a file with the request body, e.g., body.json:

    {
              "modelUri": "gpt://<folder_ID>/yandexgpt",
              "completionOptions": {
                "stream": false,
                "temperature": 0.1,
                "maxTokens": "2000",
                "reasoningOptions": {
                  "mode": "DISABLED"
                }
              },
              "messages": [
                {
                  "role": "system",
                  "text": "Translate the text"
                },
                {
                  "role": "user",
                  "text": "To be, or not to be: that is the question."
                }
              ]
            }
            
    • modelUri: ID of the model that will be used to generate the response. The parameter contains the Yandex Cloud folder ID or the tuned model's ID.

    • completionOptions: Request configuration options:

      • stream: Enables streaming of partially generated text. It can either be true or false.

      • temperature: With a higher temperature, you get more creative and randomized responses from the model. Its values range from 0 to 1, inclusive. The default value is 0.3.

      • maxTokens: Sets a limit on the model's output in tokens. The maximum number of tokens per generation depends on the model. For more information, see Yandex Cloud AI Studio quotas and limits.

      • reasoningOptions.mode: Reasoning mode parameters. This is an optional setting. The default value is DISABLED. The possible values are:

        • DISABLED: Reasoning mode is disabled.
        • ENABLED_HIDDEN: Reasoning mode is enabled. The model will decide by itself whether or not to use this mode for each particular request.
    • messages: List of messages that set the context for the model:

      • role: Message sender's role:

        • user: To send user messages to the model.
        • system: To set the request context and define the model's behavior.
        • assistant: For responses generated by the model. In chat mode, the model's responses tagged with the assistant role are included in the message to save the conversation context. Do not send user messages with this role.
      • text: Message text.

  2. Send a request to the model by running this command:

    export FOLDER_ID=<folder_ID>
            export IAM_TOKEN=<IAM_token>
            curl \
              --request POST \
              --header "Content-Type: application/json" \
              --header "Authorization: Bearer ${IAM_TOKEN}" \
              --header "x-folder-id: ${FOLDER_ID}" \
              --data "@<path_to_JSON_file>" \
              "https://ai.api.cloud.yandex.net/foundationModels/v1/completionAsync"
            

    Where:

    • FOLDER_ID: ID of the folder for which your account has the ai.languageModels.user role or higher.
    • IAM_TOKEN: IAM token you got before you started.

    In the response, the service will return the operation object:

    {
              "id": "d7qi6shlbvo5********",
              "description": "Async GPT Completion",
              "createdAt": "2023-11-30T18:31:32Z",
              "createdBy": "aje2stn6id9k********",
              "modifiedAt": "2023-11-30T18:31:33Z",
              "done": false,
              "metadata": null
            }
            

    Save the operation id you get in the response.

  3. Send a request to get the operation result:

    curl \
              --request GET \
              --header "Authorization: Bearer ${IAM_TOKEN}" \
              https://operation.api.cloud.yandex.net/operations/<operation_ID>
            

    Result example:

    {
              "done": true,
              "response": {
                "@type": "type.googleapis.com/yandex.cloud.ai.foundation_models.v1.CompletionResponse",
                "alternatives": [
                  {
                    "message": {
                      "role": "assistant",
                      "text": "To be, or not to be, that is the question."
                    },
                    "status": "ALTERNATIVE_STATUS_FINAL"
                  }
                ],
                "usage": {
                  "inputTextTokens": "31",
                  "completionTokens": "10",
                  "totalTokens": "41"
                },
                "modelVersion": "18.01.2024"
              },
              "id": "d7qo21o5fj1u********",
              "description": "Async GPT Completion",
              "createdAt": "2024-05-12T18:46:54Z",
              "createdBy": "ajes08feato8********",
              "modifiedAt": "2024-05-12T18:46:55Z"
            }
            

See also