Repository for the first AIM Hackathon together with TIMETOACT GROUP Österreich on 19.10.2024
https://openai.com/api/pricing/
Copy your teams API key from the slack channel description and place it in the .env_template file.
Don't forget to replace the filename to .env afterwards!
Check out the sample code to see how to load the key.
The dataset reports.json contains the following keys:
company_name: name of the companyyear: year of the report (if report is for two years, the first year is used)dataset: subdataset namescraped: scraped from a website, might contain broken links and irrelevant pdfs (90 samples, 2016-2024)austria: reports from mainly Austrian companies (33 samples)handcrafted: selected by hand, ESG and sustainability reports (23 samples, 2019, 2021, 2023 for every company)
pdf_url: link to the pdf report
Simply fork this repository to start working on your project.
Create a new environment (e.g. with conda)
conda create -n aim_hackathon_oct24 python=3.10Install the requirements
pip install -r requirements.txtThere is a super simple RAG implementation to help getting you started in sample_code.ipynb.
Very simple RAG pipeline to start with.
You can extract openAI API token usage from the response with response['usage'].
You can use tiktoken to manually count tokens of a string:
import tiktoken
tokenizer = tiktoken.get_encoding("o200k_base") # for gpt 4oStructured outputs force the LLM to output e.g. only integers.