NoBullFit AI is a custom-trained language model specialized in fitness, health, nutrition, and mathematical conversions. It automatically collects training data from the web and can be trained, stopped, and resumed at any time.
NoBullFit AI is part of the NoBullFit ecosystem, providing domain-specific AI capabilities for fitness and health-related tasks. Unlike general-purpose AI models, this model is trained from scratch specifically on fitness, health, nutrition, and mathematical conversion data to provide accurate, context-aware responses in these domains.
The model uses a GPT-2 architecture and is trained entirely on domain-specific data, ensuring that responses are tailored to fitness and health contexts. Training data is automatically collected from the web, eliminating the need for manual data preparation.
The AI automatically searches the web for fitness, health, nutrition, and math-related content, extracting Q&A pairs and generating synthetic data for training. This ensures the model has access to current, relevant information without manual data curation.
The model is trained from scratch on fitness, health, nutrition, and mathematical conversion tasks, including:
- Fitness Guidance: Workout recommendations, exercise form advice, training principles
- Health Information: Evidence-based health information and wellness tips
- Nutrition Planning: Meal planning, macro calculations, dietary advice
- Mathematical Conversions: Weight conversions (lbs to kg), volume conversions (ml to L), and other fitness-related calculations
Training can be started, stopped, and resumed at any time. Checkpoints are automatically saved, allowing you to pause training and continue later without losing progress.
By default, the model trains continuously with self-learning enabled. It explores and generates its own training data by asking questions, generating answers, evaluating quality, and adding good examples back to its training set, creating a self-improving learning loop.
The training process is fully automated:
- Data Collection: If training data doesn't exist, the system automatically searches the web for relevant Q&A pairs covering fitness, health, nutrition, and math topics
- Data Processing: Collected data is formatted and split into training and validation sets
- Model Training: The GPT-2 based model is trained from scratch on the collected data
- Self-Learning (continuous mode): The model generates its own questions and answers, evaluates them, and adds quality examples back to the training set
- Checkpointing: Checkpoints are saved after each epoch, allowing training to be resumed
- Model Saving: The best model (lowest validation loss) is automatically saved
Training can be stopped gracefully at any time using Ctrl+C, and the current checkpoint will be saved automatically.
-
Install dependencies
pip install -r requirements.txt
-
Start training
python train.py
By default, training runs continuously with self-learning enabled. The model will automatically collect data from the web if needed, then train indefinitely while generating its own training data.
-
Resume training
python train.py --resume
Resume from the latest checkpoint and continue training.
-
Fixed epochs training
python train.py --epochs 10
Train for a specific number of epochs instead of continuously.
-
Stop training Press
Ctrl+Cto stop gracefully. The current checkpoint will be saved automatically.
After training, load and use the model:
from nobullfit_ai.model import NoBullFitModel
model = NoBullFitModel.from_pretrained("./models/best_model")
response = model.generate("Question: Convert 500 grams to kilograms\nAnswer:")
print(response[0])All settings are configured via the .env file:
- Model:
VOCAB_SIZE,MAX_SEQ_LENGTH,EMBED_DIM,NUM_HEADS,NUM_LAYERS - Training:
BATCH_SIZE,LEARNING_RATE,NUM_EPOCHS,WARMUP_STEPS - Data:
DATA_DIR,MIN_QA_PAIRS(minimum Q&A pairs to collect) - Self-Learning:
SELF_LEARNING_INTERVAL(generate new data every N epochs, default: 5) - Checkpoints:
KEEP_CHECKPOINTS(number of checkpoints to keep, default: 5) - Device:
DEVICE(usecudafor GPU,cpufor CPU)
NoBullFit AI is built with Python and modern machine learning libraries:
- Deep Learning: PyTorch
- Model Architecture: Transformers (GPT-2 based)
- Data Collection: DuckDuckGo Search, BeautifulSoup, Requests
- Training: Custom training loop with checkpointing and resume support
NoBullFit AI collects training data from publicly available web sources. The model is trained locally on your machine, ensuring that your training process and model remain private. No data is sent to external services during training.
This project is part of the NoBullFit ecosystem. See the LICENSE file for details.
For commercial licensing inquiries, please contact us at https://nobull.fit.