Installing Tesseract-OCR on Windows devices
Tesseract-OCR is an open-source optical character recognition (OCR) engine that converts text within images into machine-readable text. Coro leverages Tesseract to identify and scan sensitive information from image files during data scans on Windows endpoint devices.
Installing Tesseract-OCR
To install Tesseract-OCR on a Windows device:
- Download and execute the Tesseract-OCR installation file .
-
Select a language from the
Installer Language
dialog dropdown, and then select
OK
:
-
Select
Next >
:
-
Review the agreement terms, and then select
I Agree
to continue:
-
Select a user installation option and then select
Next >
:
-
Select the components to install. Make sure
English
is selected in
Language data
:
-
Select
Next >
:
-
Enter the Tesseract-OCR installation directory, or use the default. Select
Next >
to continue:
Important
Record the Tesseract-OCR installation directory. It is required to configure the TESSDATA_PREFIX environment variable. Without this, Tesseract-OCR might not work properly.
Important
If you enter a custom Tesseract-OCR installation directory, you must add this directory to the PATH Environment Variable to ensure Tesseract-OCR is accessible from Windows Command Prompt.
- Select the start menu folder in which to create the Tesseract-OCR shortcuts, or select Do not create shortcuts .
-
Select
Install
:
Tesseract-OCR starts the installation.
-
After the installation completes, select
Next >
:
-
Select
Finish
:
-
Verify the Tesseract-OCR installation by opening Windows Command Prompt and entering:
tesseract -v
Windows Command Prompt displays the details of the Tesseract-OCR installation found on the device:
Important
If Windows Command Prompt does not recognize the command you must add the Tesseract-OCR installation directory to the PATH environment variable.
Creating the TESSDATA_PREFIX environment variable
note
If you installed Tesseract-OCR in a custom directory (different from the default C:\Program Files\Tesseract-OCR
), you must perform this procedure.
Tesseract-OCR uses language data files (.traineddata
) in the Tesseract-OCR\tessdata
folder for OCR. If these files are missing from the default location, Tesseract-OCR might fail to process text correctly. To ensure Tesseract-OCR finds them, set the TESSDATA_PREFIX
environment variable after installing it on your Windows device.
To add the TESSDATA_PREFIX
environment variable:
Important
These instructions apply to Windows 10 and Windows 11.
- Select Search and enter Environment Variables .
-
Select
Edit the system environmental variables
:
-
Select
Environment Variables...
from the
System Properties
dialog:
-
Under
System variables
, select
New...
:
-
Enter the following configuration:
- Variable name : TESSDATA_PREFIX
-
Variable value
: Enter the full path to the
tessdata
folder inside your Tesseract-OCR installation directory . For example, C:\Program Files\Tesseract-OCR\tessdata .
-
Select
OK
:
Windows creates the
TESSDATA_PREFIX
environment variable. -
Verify the
TESSDATA_PREFIX
environment variable by opening Windows Command Prompt and entering:echo %TESSDATA_PREFIX%
Windows Command Prompt displays the
TESSDATA_PREFIX
environment variable:
Adding the Tesseract-OCR installation directory to the PATH environment variable
note
If you installed Tesseract-OCR in a custom directory (different from the default C:\Program Files\Tesseract-OCR
), you must perform this procedure.
When you add your Tesseract-OCR installation directory to the PATH environment variable, the operating system (OS) can locate and run Tesseract-OCR from Windows Command Prompt without needing the full path to the Tesseract-OCR executable file.
To add the Tesseract-OCR installation directory to the PATH environment variable:
Important
These instructions apply to Windows 10 and Windows 11.
- Select Search and enter Environment Variables .
-
Select
Edit the system environmental variables
:
-
Select
Environment Variables...
from the
System Properties
dialog:
-
Locate the
Path
variable in the
System variables
list, and then select
Edit...
:
-
Select
New
, paste your Tesseract
installation directory
, and then select
OK
:
Windows adds the Tesseract-OCR installation directory to the PATH environment variable.
-
Verify the PATH environment variable by opening Windows Command Prompt and entering:
tesseract -v
Windows Command Prompt displays the details of the Tesseract-OCR installation found on the device: