The repository is focused on extracting the markdown layout from various documents using document intelligence then using gpt-4o to convert into structured outputs.
To set up the project, follow these steps:
-
Clone the repository:
git clone https://github.com/szetinglau/DocIntel-Extraction-Structured.git cd DocIntel-Extraction-Structured -
Create a virtual environment and activate it:
python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Set your credentials in config.json
To use the notebooks, follow these steps:
-
Launch Jupyter Notebook:
jupyter notebook
-
Open the desired notebook from the Jupyter interface and follow the instructions within the notebook to perform document extraction tasks.
This project is licensed under the MIT License. See the LICENSE file for more details.