Getting started with Vision OCR

This section describes how to recognize text in an image or file using the Vision OCR API.

Getting started

To use the examples, install cURL.

  1. On the Yandex Cloud Billing page, make sure your billing account is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one.
  2. Get an IAM token, which is required for authentication.
  3. Get the ID of any folder for which your account has the ai.vision.user role or higher.
  4. Specify the ID in the x-folder-id header.

Recognize text

You can use any recognition model from this list. As an example, we will use the page model which can recognize any amount of text in an image:

  1. Prepare an image file that meets the requirements:

    • Supported file formats: JPEG, PNG, PDF. Specify the file’s MIME type in the mime_type property. The default value is image.
    • The maximum file size is 10 MB.
    • The image size should not exceed 20 megapixels (width × height).

    Tip

    If you need an example, download an image of a penguin warning road sign.

  2. Encode the image file as Base64:

    base64 -i input.jpg > output.txt
            
    C:> Base64.exe -e input.jpg > output.txt
            
    [Convert]::ToBase64String([IO.File]::ReadAllBytes("./input.jpg")) > output.txt
            
    # Import a library for Base64 encoding.
            import base64
            
            # Create a function that encodes a file and returns the encoded result.
            def encode_file(file_path):
              with open(file_path, "rb") as fid:
                file_content = fid.read()
              return base64.b64encode(file_content).decode("utf-8")
            
    // Read the file contents into memory.
            var fs = require('fs');
            var file = fs.readFileSync('/path/to/file');
            
            // Get the file contents encoded in Base64
            var encoded = Buffer.from(file).toString('base64');
            
    // Import a library for Base64 encoding.
            import org.apache.commons.codec.binary.Base64;
            
            // Get the file contents encoded in Base64.
            byte[] fileData = Base64.encodeBase64(yourFile.getBytes());
            
    import (
              "bufio"
              "encoding/base64"
              "io/ioutil"
              "os"
            )
            
            // Open the file.
            f, _ := os.Open("/path/to/file")
            
            // Read the file contents.
            reader := bufio.NewReader(f)
            content, _ := ioutil.ReadAll(reader)
            
            // Get the file contents encoded in Base64.
            base64.StdEncoding.EncodeToString(content)
            
  3. Create a file with the request body, e.g., body.json.

    body.json:

    {
              "mimeType": "JPEG",
              "languageCodes": ["*"],
              "model": "page",
              "content": "<base64_encoded_image>"
            }
            

    In the content property, specify the image file contents encoded as Base64.

    To automatically detect the text language, specify the "languageCodes": ["*"] property in the configuration.

  4. Make a request via the recognize method and save the response to a file, e.g., output.json:

    export IAM_TOKEN=<IAM_token>
            curl \
              --request POST \
              --header "Content-Type: application/json" \
              --header "Authorization: Bearer ${IAM_TOKEN}" \
              --header "x-folder-id: <folder_ID>" \
              --header "x-data-logging-enabled: true" \
              --data '{
                "mimeType": "JPEG",
                "languageCodes": ["ru","en"],
                "model": "handwritten",
                "content": "<base64_encoded_image>"
              }' \
              https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText \
              --output output.json
            

    Where:

    • <IAM_token>: IAM token you got earlier.
    • <folder_ID>: Folder ID you got earlier.
    data = {"mimeType": <mime_type>,
                    "languageCodes": ["ru","en"],
                    "content": content}
            
            url = "https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText"
            
            headers= {"Content-Type": "application/json",
                      "Authorization": "Bearer {:s}".format(<IAM_token>),
                      "x-folder-id": "<folder_ID>",
                      "x-data-logging-enabled": "true"}
              
            w = requests.post(url=url, headers=headers, data=json.dumps(data))
            

    The result will consist of recognized blocks of text, lines, and words with their position on the image:

    {
                "result":
                {
                    "textAnnotation":
                    {
                        "width": "1920",
                        "height": "1280",
                        "blocks":
                        [
                            {
                                "boundingBox":
                                {
                                    "vertices":
                                    [
                                        {
                                            "x": "460",
                                            "y": "777"
                                        },
                                        {
                                            "x": "460",
                                            "y": "906"
                                        },
                                        {
                                            "x": "810",
                                            "y": "906"
                                        },
                                        {
                                            "x": "810",
                                            "y": "777"
                                        }
                                    ]
                                },
                                "lines":
                                [
                                    {
                                        "boundingBox":
                                        {
                                            "vertices":
                                            [
                                                {
                                                    "x": "460",
                                                    "y": "777"
                                                },
                                                {
                                                    "x": "460",
                                                    "y": "820"
                                                },
                                                {
                                                    "x": "802",
                                                    "y": "820"
                                                },
                                                {
                                                    "x": "802",
                                                    "y": "777"
                                                }
                                            ]
                                        },
                                        "text": "PENGUINS",
                                        "words":
                                        [
                                            {
                                                "boundingBox":
                                                {
                                                    "vertices":
                                                    [
                                                        {
                                                            "x": "460",
                                                            "y": "768"
                                                        },
                                                        {
                                                            "x": "460",
                                                            "y": "830"
                                                        },
                                                        {
                                                            "x": "802",
                                                            "y": "830"
                                                        },
                                                        {
                                                            "x": "802",
                                                            "y": "768"
                                                        }
                                                    ]
                                                },
                                                "text": "PENGUINS",
                                                "entityIndex": "-1",
                                                "textSegments":
                                                [
                                                    {
                                                        "startIndex": "0",
                                                        "length": "8"
                                                    }
                                                ]
                                            }
                                        ],
                                        "textSegments":
                                        [
                                            {
                                                "startIndex": "0",
                                                "length": "8"
                                            }
                                        ],
                                        "orientation": "ANGLE_0"
                                    },
                                    {
                                        "boundingBox":
                                        {
                                            "vertices":
                                            [
                                                {
                                                    "x": "489",
                                                    "y": "861"
                                                },
                                                {
                                                    "x": "489",
                                                    "y": "906"
                                                },
                                                {
                                                    "x": "810",
                                                    "y": "906"
                                                },
                                                {
                                                    "x": "810",
                                                    "y": "861"
                                                }
                                            ]
                                        },
                                        "text": "CROSSING",
                                        "words":
                                        [
                                            {
                                                "boundingBox":
                                                {
                                                    "vertices":
                                                    [
                                                        {
                                                            "x": "489",
                                                            "y": "852"
                                                        },
                                                        {
                                                            "x": "489",
                                                            "y": "916"
                                                        },
                                                        {
                                                            "x": "810",
                                                            "y": "916"
                                                        },
                                                        {
                                                            "x": "810",
                                                            "y": "852"
                                                        }
                                                    ]
                                                },
                                                "text": "CROSSING",
                                                "entityIndex": "-1",
                                                "textSegments":
                                                [
                                                    {
                                                        "startIndex": "9",
                                                        "length": "8"
                                                    }
                                                ]
                                            }
                                        ],
                                        "textSegments":
                                        [
                                            {
                                                "startIndex": "9",
                                                "length": "8"
                                            }
                                        ],
                                        "orientation": "ANGLE_0"
                                    }
                                ],
                                "languages":
                                [
                                    {
                                        "languageCode": "en"
                                    }
                                ],
                                "textSegments":
                                [
                                    {
                                        "startIndex": "0",
                                        "length": "17"
                                    }
                                ],
                                "layoutType": "LAYOUT_TYPE_TEXT"
                            },
                            {
                                "boundingBox":
                                {
                                    "vertices":
                                    [
                                        {
                                            "x": "547",
                                            "y": "989"
                                        },
                                        {
                                            "x": "547",
                                            "y": "1046"
                                        },
                                        {
                                            "x": "748",
                                            "y": "1046"
                                        },
                                        {
                                            "x": "748",
                                            "y": "989"
                                        }
                                    ]
                                },
                                "lines":
                                [
                                    {
                                        "boundingBox":
                                        {
                                            "vertices":
                                            [
                                                {
                                                    "x": "547",
                                                    "y": "989"
                                                },
                                                {
                                                    "x": "547",
                                                    "y": "1046"
                                                },
                                                {
                                                    "x": "748",
                                                    "y": "1046"
                                                },
                                                {
                                                    "x": "748",
                                                    "y": "989"
                                                }
                                            ]
                                        },
                                        "text": "SLOW",
                                        "words":
                                        [
                                            {
                                                "boundingBox":
                                                {
                                                    "vertices":
                                                    [
                                                        {
                                                            "x": "547",
                                                            "y": "983"
                                                        },
                                                        {
                                                            "x": "547",
                                                            "y": "1054"
                                                        },
                                                        {
                                                            "x": "748",
                                                            "y": "1054"
                                                        },
                                                        {
                                                            "x": "748",
                                                            "y": "983"
                                                        }
                                                    ]
                                                },
                                                "text": "SLOW",
                                                "entityIndex": "-1",
                                                "textSegments":
                                                [
                                                    {
                                                        "startIndex": "18",
                                                        "length": "4"
                                                    }
                                                ]
                                            }
                                        ],
                                        "textSegments":
                                        [
                                            {
                                                "startIndex": "18",
                                                "length": "4"
                                            }
                                        ],
                                        "orientation": "ANGLE_0"
                                    }
                                ],
                                "languages":
                                [
                                    {
                                        "languageCode": "en"
                                    }
                                ],
                                "textSegments":
                                [
                                    {
                                        "startIndex": "18",
                                        "length": "4"
                                    }
                                ],
                                "layoutType": "LAYOUT_TYPE_TEXT"
                            }
                        ],
                        "entities":
                        [],
                        "tables":
                        [],
                        "fullText": "PENGUINS\nCROSSING\nSLOW\n",
                        "rotate": "ANGLE_0",
                        "markdown": " ",
                        "pictures":
                        []
                    },
                    "page": "0"
                }
            }
            
  5. To get all the words recognized in the image, find all values with the text property.

Note

If the coordinates you received do not match the displayed positions of elements, you can either enable exif metadata support in your image viewer, or strip the Orientation attribute from the exif section before sending the image to the service.