A lightweight local API for running a pre-trained large language model (LLM) on a Windows machine without a GPU using Flask, Hugging Face Transformers, and PyTorch.
This project shows how to deploy a pre-trained model like mistralai/Mistral-7B-Instruct-v0.1 in a resource-constrained environment β ideal for prototyping or learning.
- Runs on CPU (no GPU required)
- Flask REST API for easy integration
- Uses Hugging Face
transformers+pipeline - Clean project structure
- Good starting point for local GenAI apps
- Python 3.8+
- Flask
- Hugging Face Transformers
- PyTorch (CPU)
- Mistral-7B-Instruct or similar open LLM
git clone https://github.com/scouring/LLM.git
cd LLM.git