CareerBot-RAG is a project that analyzes CVs and provides skill and occupation suggestions based on the ESCO dataset.
published paper : https://www.theseus.fi/handle/10024/874901?show=full
- Python 3.8+
- Node.js 14+
- npm 6+
-
Create a venv :
python -m venv env -
Activate venv :
Windows :
.\env\Scripts\activateUnix or MacOS :
source env/bin/activate
- Install Python dependencies:
pip install -r requirements.txt
-
Navigate to the UI directory:
cd UI -
Install npm dependencies:
npm install
-
OpenAI API key:
- The project uses a
.envfile in the Backend directory to store the OpenAI API key. - Create or edit the
.envfile:OPENAI_API_KEY=your_new_api_key_here - Make sure not to commit this file to version control to keep your API key secure.
- The project uses a
-
Constants: The project uses several important constants that can be found in the
Backend/app/config.pyfile:EMBEDDING_MODEL_NAME: The OpenAI model used for generating embeddings (default: "text-embedding-3-small")GENERATION_MODEL_NAME: The OpenAI model used for text generation (default: "gpt-4o-mini")NUMBER_DOC_PER_ITEM: Number of documents retrieved for each item from the vector database. The higher the more items will be given to the context of the LLM picking step (default: 1).LLM_MAX_PICKS: Maximum number of items identified from the CV shown to the user [skills, occupations] (default: [15, 5])NB_SUGGESTED_SKILLS: Number of skills suggested to the user (default: 20)NB_SUGGESTED_OCCUPATIONS: Number of occupations suggested to the user (default: 10)
You can modify these constants in the
config.pyfile to adjust the behavior of the application. For example:EMBEDDING_MODEL_NAME = "text-embedding-3-large" NB_SUGGESTED_SKILLS = 15
Note: Changing some of these constants (especially
EMBEDDING_MODEL_NAME) may require regenerating the FAISS index.
- Update the
EMBEDDING_MODEL_NAMEinBackend/app/config.pyto your desired model. ( text-embedding-3-small by default and recommanded ) - Navigate to the indexing directory:
cd POC/indexing - Run the FAISS index creation script:
python create_FAISS_index.py
This will create new FAISS index files in the data/processed_data/FAISS_index directory.
- After creating the new index, you must run the script to update the options list:
cd ../../utils python creating_options_list.py
This script generates updated JSON files for skills and occupations options, which are used by the frontend.
Note: Creating a new index can be time-consuming, costs money and may require significant computational resources, especially for larger models.
-
Navigate to the Backend directory:
cd Backend -
Start the FastAPI server:
uvicorn app.main:app --reload
The backend will be available at http://localhost:8000.
-
Navigate to the UI directory:
cd UI -
Start the Next.js development server:
npm run dev
The frontend will be available at http://localhost:3000.
- Open your browser and go to
http://localhost:3000 - Upload a CV file (PDF format)
- Click "Analyze" to process the CV
- View the suggested skills and occupations based on the CV content
Backend/: Contains the FastAPI backendUI/: Contains the Next.js frontendPOC/: Proof of concept scripts and experimentsdata/: Data files and processed indexes
- This work was done as a project for a Haaga-Helia thesis
- This project uses Open AI's model. each call costs money. Ensure you have sufficient OpenAI API credits
- The application uses the ESCO (European Skills, Competences, Qualifications and Occupations) dataset
- The API key shown have been revoked