This project shows how to extract text from any image using Python and Tesseract OCR. In just a few steps, you can convert images into editable text!
pytesseract
β Python wrapper for TesseractPillow
β For image processing- Tesseract-OCR β The OCR engine (must be installed separately)
π Download the latest stable version from this direct link:
π Download Tesseract-OCR for Windows
After downloading:
- Run the setup file to install Tesseract-OCR
- By default, it's installed at:
C:\Program Files\Tesseract-OCR\tesseract.exe
- If you want to change the path, make sure to set it during installation
If you're unsure where Tesseract was installed, hereβs how to find the path:
- Open the folder:
C:\Program Files\Tesseract-OCR
- Look for the file:
tesseract.exe
- Copy the full path from the address bar
Example path to use in your Python script:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
pip install pytesseract
pip install pillow
Make sure the image (e.g. quote1.png
) is in the same folder as your script.
import pytesseract
from PIL import Image
# β
Set the full path to tesseract.exe
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
# Load and process image
image = Image.open("quote1.png")
text = pytesseract.image_to_string(image)
# Display the extracted text
print(text)
- π Use high-quality images with clear text for best results
- π§½ Preprocess image: grayscale, resize, sharpen etc. for better OCR accuracy
- π For different languages, use
lang='your_lang_code'
inimage_to_string
- π Automate multiple images by using a loop with a folder
The script will print the extracted text directly to the terminal. You can also save it to a file if needed.
with open("output.txt", "w", encoding="utf-8") as file:
file.write(text)