Installing Tesseract-OCR on Windows devices

Tesseract-OCR is an open-source optical character recognition (OCR) engine that converts text within images into machine-readable text. Coro leverages Tesseract to identify and scan sensitive information from image files during data scans on Windows endpoint devices.

Installing Tesseract-OCR

To install Tesseract-OCR on a Windows device:

  1. Download and execute the Tesseract-OCR installation file.

  2. Select a language from the Installer Language dialog dropdown, and then select OK:

    Install wizard choose language
  3. Select Next >:

    Welcome
  4. Review the agreement terms, and then select I Agree to continue:

    Welcome
  5. Select a user installation option and then select Next >:

    Users
  6. Select the components to install. Make sure English is selected in Language data:

    Language
  7. Select Next >:

    Components
  8. Enter the Tesseract-OCR installation directory, or use the default. Select Next > to continue:

    Install wizard choose location
    Important

    Record the Tesseract-OCR installation directory. It is required to configure the TESSDATA_PREFIX environment variable. Without this, Tesseract-OCR might not work properly.

    Important

    If you enter a custom Tesseract-OCR installation directory, you must add this directory to the PATH Environment Variable to ensure Tesseract-OCR is accessible from Windows Command Prompt.

  9. Select the start menu folder in which to create the Tesseract-OCR shortcuts, or select Do not create shortcuts.

  10. Select Install:

    Shortcuts

    Tesseract-OCR starts the installation.

  11. After the installation completes, select Next >:

    Complete
  12. Select Finish:

    Finish
  13. Verify the Tesseract-OCR installation by opening Windows Command Prompt and entering:

    tesseract -v

    Windows Command Prompt displays the details of the Tesseract-OCR installation found on the device:

    Verify
    Important

    If Windows Command Prompt does not recognize the command you must add the Tesseract-OCR installation directory to the PATH environment variable.

Creating the TESSDATA_PREFIX environment variable

If you installed Tesseract-OCR in a custom directory (different from the default C:\Program Files\Tesseract-OCR), you must perform this procedure.

Tesseract-OCR uses language data files (.traineddata) in the Tesseract-OCR\tessdata folder for OCR. If these files are missing from the default location, Tesseract-OCR might fail to process text correctly. To ensure Tesseract-OCR finds them, set the TESSDATA_PREFIX environment variable after installing it on your Windows device.

To add the TESSDATA_PREFIX environment variable:

Important

These instructions apply to Windows 10 and Windows 11.

  1. Select Search and enter Environment Variables.

  2. Select Edit the system environmental variables:

    System Variable
  3. Select Environment Variables... from the System Properties dialog:

    Environment Variables
  1. Under System variables, select New...:

    Environment Variables
  2. Enter the following configuration:

    • Variable name: TESSDATA_PREFIX
    • Variable value: Enter the full path to the tessdata folder inside your Tesseract-OCR installation directory. For example, C:\Program Files\Tesseract-OCR\tessdata.
  3. Select OK:

    Complete

    Windows creates the TESSDATA_PREFIX environment variable.

  4. Verify the TESSDATA_PREFIX environment variable by opening Windows Command Prompt and entering:

    echo %TESSDATA_PREFIX%

    Windows Command Prompt displays the TESSDATA_PREFIX environment variable:

    Verify

Adding the Tesseract-OCR installation directory to the PATH environment variable

If you installed Tesseract-OCR in a custom directory (different from the default C:\Program Files\Tesseract-OCR), you must perform this procedure.

When you add your Tesseract-OCR installation directory to the PATH environment variable, the operating system (OS) can locate and run Tesseract-OCR from Windows Command Prompt without needing the full path to the Tesseract-OCR executable file.

To add the Tesseract-OCR installation directory to the PATH environment variable:

Important

These instructions apply to Windows 10 and Windows 11.

  1. Select Search and enter Environment Variables.

  2. Select Edit the system environmental variables:

    System Variable
  3. Select Environment Variables... from the System Properties dialog:

    Environment Variables
  1. Locate the Path variable in the System variables list, and then select Edit...:

    Edit Variables
  2. Select New, paste your Tesseract installation directory, and then select OK:

    Add path

    Windows adds the Tesseract-OCR installation directory to the PATH environment variable.

  3. Verify the PATH environment variable by opening Windows Command Prompt and entering:

    tesseract -v

    Windows Command Prompt displays the details of the Tesseract-OCR installation found on the device:

    Verify