Skip to content

Domain-specific language model for fitness, health, nutrition, and mathematical conversions, trained from scratch with automated data collection.

License

Notifications You must be signed in to change notification settings

pathvoid/nobullfit-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NoBullFit AI

NoBullFit AI is a custom-trained language model specialized in fitness, health, nutrition, and mathematical conversions. It automatically collects training data from the web and can be trained, stopped, and resumed at any time.

About

NoBullFit AI is part of the NoBullFit ecosystem, providing domain-specific AI capabilities for fitness and health-related tasks. Unlike general-purpose AI models, this model is trained from scratch specifically on fitness, health, nutrition, and mathematical conversion data to provide accurate, context-aware responses in these domains.

The model uses a GPT-2 architecture and is trained entirely on domain-specific data, ensuring that responses are tailored to fitness and health contexts. Training data is automatically collected from the web, eliminating the need for manual data preparation.

What It Does

Automatic Data Collection

The AI automatically searches the web for fitness, health, nutrition, and math-related content, extracting Q&A pairs and generating synthetic data for training. This ensures the model has access to current, relevant information without manual data curation.

Domain-Specific Training

The model is trained from scratch on fitness, health, nutrition, and mathematical conversion tasks, including:

  • Fitness Guidance: Workout recommendations, exercise form advice, training principles
  • Health Information: Evidence-based health information and wellness tips
  • Nutrition Planning: Meal planning, macro calculations, dietary advice
  • Mathematical Conversions: Weight conversions (lbs to kg), volume conversions (ml to L), and other fitness-related calculations

Flexible Training

Training can be started, stopped, and resumed at any time. Checkpoints are automatically saved, allowing you to pause training and continue later without losing progress.

Self-Learning Mode (Default)

By default, the model trains continuously with self-learning enabled. It explores and generates its own training data by asking questions, generating answers, evaluating quality, and adding good examples back to its training set, creating a self-improving learning loop.

How It Works

The training process is fully automated:

  1. Data Collection: If training data doesn't exist, the system automatically searches the web for relevant Q&A pairs covering fitness, health, nutrition, and math topics
  2. Data Processing: Collected data is formatted and split into training and validation sets
  3. Model Training: The GPT-2 based model is trained from scratch on the collected data
  4. Self-Learning (continuous mode): The model generates its own questions and answers, evaluates them, and adds quality examples back to the training set
  5. Checkpointing: Checkpoints are saved after each epoch, allowing training to be resumed
  6. Model Saving: The best model (lowest validation loss) is automatically saved

Training can be stopped gracefully at any time using Ctrl+C, and the current checkpoint will be saved automatically.

Quick Start

  1. Install dependencies

    pip install -r requirements.txt
  2. Start training

    python train.py

    By default, training runs continuously with self-learning enabled. The model will automatically collect data from the web if needed, then train indefinitely while generating its own training data.

  3. Resume training

    python train.py --resume

    Resume from the latest checkpoint and continue training.

  4. Fixed epochs training

    python train.py --epochs 10

    Train for a specific number of epochs instead of continuously.

  5. Stop training Press Ctrl+C to stop gracefully. The current checkpoint will be saved automatically.

Usage

After training, load and use the model:

from nobullfit_ai.model import NoBullFitModel

model = NoBullFitModel.from_pretrained("./models/best_model")
response = model.generate("Question: Convert 500 grams to kilograms\nAnswer:")
print(response[0])

Configuration

All settings are configured via the .env file:

  • Model: VOCAB_SIZE, MAX_SEQ_LENGTH, EMBED_DIM, NUM_HEADS, NUM_LAYERS
  • Training: BATCH_SIZE, LEARNING_RATE, NUM_EPOCHS, WARMUP_STEPS
  • Data: DATA_DIR, MIN_QA_PAIRS (minimum Q&A pairs to collect)
  • Self-Learning: SELF_LEARNING_INTERVAL (generate new data every N epochs, default: 5)
  • Checkpoints: KEEP_CHECKPOINTS (number of checkpoints to keep, default: 5)
  • Device: DEVICE (use cuda for GPU, cpu for CPU)

Technology Stack

NoBullFit AI is built with Python and modern machine learning libraries:

  • Deep Learning: PyTorch
  • Model Architecture: Transformers (GPT-2 based)
  • Data Collection: DuckDuckGo Search, BeautifulSoup, Requests
  • Training: Custom training loop with checkpointing and resume support

Privacy Commitment

NoBullFit AI collects training data from publicly available web sources. The model is trained locally on your machine, ensuring that your training process and model remain private. No data is sent to external services during training.

License

This project is part of the NoBullFit ecosystem. See the LICENSE file for details.

For commercial licensing inquiries, please contact us at https://nobull.fit.

About

Domain-specific language model for fitness, health, nutrition, and mathematical conversions, trained from scratch with automated data collection.

Topics

Resources

License

Stars

Watchers

Forks

Languages