Installing Tesseract-OCR on Windows devices

Tesseract-OCR is an open-source optical character recognition (OCR) engine that converts text within images into machine-readable text. Coro leverages Tesseract to identify and scan sensitive information from image files during data scans on Windows endpoint devices.

Installing Tesseract-OCR

To install Tesseract-OCR on a Windows device:

  1. Download and execute the Tesseract-OCR installation file .
  2. Select a language from the Installer Language dialog dropdown, and then select OK :

    Install wizard choose language

  3. Select Next > :

    Welcome

  4. Review the agreement terms, and then select I Agree to continue:

    Welcome

  5. Select a user installation option and then select Next > :

    Users

  6. Select the components to install. Make sure English is selected in Language data :

    Language

  7. Select Next > :

    Components

  8. Enter the Tesseract-OCR installation directory, or use the default. Select Next > to continue:

    Install wizard choose location

    Important

    Record the Tesseract-OCR installation directory. It is required to configure the TESSDATA_PREFIX environment variable. Without this, Tesseract-OCR might not work properly.

    Important

    If you enter a custom Tesseract-OCR installation directory, you must add this directory to the PATH Environment Variable to ensure Tesseract-OCR is accessible from Windows Command Prompt.

  9. Select the start menu folder in which to create the Tesseract-OCR shortcuts, or select Do not create shortcuts .
  10. Select Install :

    Shortcuts

    Tesseract-OCR starts the installation.

  11. After the installation completes, select Next > :

    Complete

  12. Select Finish :

    Finish

  13. Verify the Tesseract-OCR installation by opening Windows Command Prompt and entering:
    Copy
    Copied
    tesseract -v

    Windows Command Prompt displays the details of the Tesseract-OCR installation found on the device:

    Verify

    Important

    If Windows Command Prompt does not recognize the command you must add the Tesseract-OCR installation directory to the PATH environment variable.

Creating the TESSDATA_PREFIX environment variable

note

If you installed Tesseract-OCR in a custom directory (different from the default C:\Program Files\Tesseract-OCR), you must perform this procedure.

Tesseract-OCR uses language data files (.traineddata) in the Tesseract-OCR\tessdata folder for OCR. If these files are missing from the default location, Tesseract-OCR might fail to process text correctly. To ensure Tesseract-OCR finds them, set the TESSDATA_PREFIX environment variable after installing it on your Windows device.

To add the TESSDATA_PREFIX environment variable:

Important

These instructions apply to Windows 10 and Windows 11.

  1. Select Search and enter Environment Variables .
  2. Select Edit the system environmental variables :

    System Variable

  3. Select Environment Variables... from the System Properties dialog:

    Environment Variables

  1. Under System variables , select New... :

    Environment Variables

  2. Enter the following configuration:
    • Variable name : TESSDATA_PREFIX
    • Variable value : Enter the full path to the tessdata folder inside your Tesseract-OCR installation directory . For example, C:\Program Files\Tesseract-OCR\tessdata .
  3. Select OK :

    Complete

    Windows creates the TESSDATA_PREFIX environment variable.

  4. Verify the TESSDATA_PREFIX environment variable by opening Windows Command Prompt and entering:
    Copy
    Copied
    echo %TESSDATA_PREFIX%

    Windows Command Prompt displays the TESSDATA_PREFIX environment variable:

    Verify

Adding the Tesseract-OCR installation directory to the PATH environment variable

note

If you installed Tesseract-OCR in a custom directory (different from the default C:\Program Files\Tesseract-OCR), you must perform this procedure.

When you add your Tesseract-OCR installation directory to the PATH environment variable, the operating system (OS) can locate and run Tesseract-OCR from Windows Command Prompt without needing the full path to the Tesseract-OCR executable file.

To add the Tesseract-OCR installation directory to the PATH environment variable:

Important

These instructions apply to Windows 10 and Windows 11.

  1. Select Search and enter Environment Variables .
  2. Select Edit the system environmental variables :

    System Variable

  3. Select Environment Variables... from the System Properties dialog:

    Environment Variables

  1. Locate the Path variable in the System variables list, and then select Edit... :

    Edit Variables

  2. Select New , paste your Tesseract installation directory , and then select OK :

    Add path

    Windows adds the Tesseract-OCR installation directory to the PATH environment variable.

  3. Verify the PATH environment variable by opening Windows Command Prompt and entering:
    Copy
    Copied
    tesseract -v

    Windows Command Prompt displays the details of the Tesseract-OCR installation found on the device:

    Verify