Skip to content

atulkushwaha0112-py/Image-Text-Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

This project shows how to extract text from any image using Python and Tesseract OCR. In just a few steps, you can convert images into editable text!

πŸ“‚ Project Requirements

  • pytesseract – Python wrapper for Tesseract
  • Pillow – For image processing
  • Tesseract-OCR – The OCR engine (must be installed separately)

πŸ”— Download Tesseract-OCR

πŸ‘‰ Download the latest stable version from this direct link:

πŸ”— Download Tesseract-OCR for Windows

After downloading:

  • Run the setup file to install Tesseract-OCR
  • By default, it's installed at:
    C:\Program Files\Tesseract-OCR\tesseract.exe
  • If you want to change the path, make sure to set it during installation

πŸ” How to Find the Path to tesseract.exe

If you're unsure where Tesseract was installed, here’s how to find the path:

  1. Open the folder: C:\Program Files\Tesseract-OCR
  2. Look for the file: tesseract.exe
  3. Copy the full path from the address bar

Example path to use in your Python script:

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

πŸ“¦ Install Required Python Libraries

pip install pytesseract
pip install pillow

πŸ“Œ Python Code to Extract Text from Image

Make sure the image (e.g. quote1.png) is in the same folder as your script.

import pytesseract
from PIL import Image

# βœ… Set the full path to tesseract.exe
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# Load and process image
image = Image.open("quote1.png")
text = pytesseract.image_to_string(image)

# Display the extracted text
print(text)

⚑ Pro Tips & Hacks

  • πŸ” Use high-quality images with clear text for best results
  • 🧽 Preprocess image: grayscale, resize, sharpen etc. for better OCR accuracy
  • 🌐 For different languages, use lang='your_lang_code' in image_to_string
  • πŸ“ Automate multiple images by using a loop with a folder

βœ… Example Output

The script will print the extracted text directly to the terminal. You can also save it to a file if needed.

with open("output.txt", "w", encoding="utf-8") as file:
    file.write(text)

About

This project shows how to extract text from any image using Python and Tesseract OCR. In just a few steps, you can convert images into editable text!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages