Skip to content

An automated tool to extract Terminal IDs (TID) from PDF-embedded images using OCR and exporting structured data to Excel.

Notifications You must be signed in to change notification settings

Amer-css/TID-Extractor-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 TID Smart Extractor (PDF to Excel)

An advanced automation desktop application built with C# .NET to streamline the extraction of Terminal IDs (TID) from POS receipts within PDF documents.

🌟 Key Features

  • Batch PDF Processing: Select a folder and process multiple PDFs at once.
  • Real-time Progress Tracking: A visual Progress Bar (0-100%) to keep you informed.
  • Intelligent OCR: Powered by Tesseract to recognize text patterns within images.
  • Direct Excel Export: Generates an organized report including Filename, Page Number, and TID Status.
  • Ready-to-Download: Instant access to the results file upon completion.

📸 Application Preview

1. Launch & Setup

Upon starting the app, you are greeted with a clean interface to select your source directory. Launch Screen

2. Processing Data

The system extracts pages, converts them to images, and runs the OCR engine. You can monitor the status via the progress bar. Processing Screen

3. Completion & Export

Once finished, the user is notified, and a download button appears to access the generated Excel report. Completion Screen


🚀 How It Works (The Logic)

  1. Scanning: The app identifies all .pdf files in the selected folder.
  2. Conversion: Each PDF page is rendered into a high-resolution image.
  3. Extraction: The OCR engine scans for the keyword "TID" followed by numerical patterns.
  4. Logging: Results are stored in memory and then written to an Excel sheet using ClosedXML.
  5. Status Reporting: If a TID is unreadable, it is marked as Not Found for manual review.

🛠️ Built With

  • C# / Windows Forms - User Interface
  • Tesseract OCR - Text Recognition Engine
  • iText7 / Magick.NET - PDF Image Extraction
  • ClosedXML - Excel Report Generation

👨‍💻 Author

Amer-css - GitHub Profile

About

An automated tool to extract Terminal IDs (TID) from PDF-embedded images using OCR and exporting structured data to Excel.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published