- Answers specific questions on scientific publications (~2500–3000 articles in total).
- Automates the process while keeping great results.
- Addresses optimization goals:
- Reduce tokens used
- Provide clear and better responses
- Lower the hallucination rate
- Get the most from AI
- Returns a pre-defined Excel document with Q/A columns.
- Used in a pharma research lab at CHU Sainte-Justine, Montreal to rework their website.
If you received this as a ZIP file, unzip it on your Desktop.
Right-click on the forlder, find 'Open in terminal'
Skip to next step.
If you’re using Git:
On your Desktop, right-click and find 'Open in terminal'
Once openned, run the commands :
git clone https://github.com/simy46/ImpactPharma.git
cd ImpactPharmaPlace all the PDF files you want to analyze inside the /pdfs folder of the project.
These are the pdfs that will be used : Chose the ones that YOU want.
Example: ImpactPharma/pdfs/*.pdf
Make sure you have Python 3.10 or newer installed.
Then on the terminal
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
python main.pyAfter the script finishes, results will be saved in:
/outputs/resultats_YYYYMMDD_HHMMSS.xlsx: Excel with all extracted answers/logs/pipeline_YYYYMMDD_HHMMSS.log: detailed processing log for each PDF
- Don't edit the Excel file while the script is running.
- If you have questions or errors, share the log file with the developer.
- It will always write inside a new
/outputs/resultats_YYYYMMDD_HHMMSS.xlsxwith the date and time written on the name of the file.
This is a work for a research lab at a Montreal hospital (the most popular if you ask me). I am happy to help them achieve their goal.
I am no llm dev (yet : 10/07/2025), but I'm leaning towards that path more and more. I do love research and might get a paper out of this project.
https://impactpharmacie.org/index.php?p=greeter.php
for pdf in pdf_files:
text = PDFLoader.extract_text(pdf)
responses = {}
for category in categories:
prompt = PromptManager.build_prompt(category, text)
answer_raw = APIManager.ask(prompt)
parsed = ResponseParser.parse(answer_raw)
responses.update(parsed)
ExcelWriter.insert_row(pdf_name, responses)