Extract and download key-value pairs, tables, and paragraphs from your scanned pdf, jpg, and png documents as CSV files.
| Technology | Used for |
|---|---|
| Flask | Backend |
| React + Tailwind + DaisyUI | Frontend |
| Azure FormRecognizer | Extracting data from document |
| Azure BlobStorage | Storing uploaded documents |
- Run
npm iin frontend folder followed bynpm run build - Run
pip install -r requirements.txtin root folder - Create a
.envfile with the below content:
Create a Azure FormRecognizer service and copy the Endpoint and KEY1 from Keys and Endpoint. These will be the ENDPOINT and KEY respectively. Next create an azure storage account, and create a container in it. Go to Shared access tokens and click Generate SAS token and URL. Copy the Blod SAS URL. The part to the left of ? goes in BLOB_ENDPOINT and the part to the right goes in BLOB_QUERY
ENDPOINT = "https://xyz.cognitiveservices.azure.com"
KEY = "12345something"
BLOB_ENDPOINT = "https://xyz.blob.core.windows.net/containerName/"
BLOB_QUERY = "?xyz=xyz&xyz=xyz..."
- Run with
py main.py
Run py extract.py -i "input/file/path.pdf" -o "output/file/path.csv"