Skip to content

dheepakshakthi/Custom_LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Cogni-Mamba: Efficient CLI Chatbot with RAG ๐Ÿค–

A lightweight, resource-efficient Large Language Model (LLM) with a command-line chatbot interface and RAG (Retrieval-Augmented Generation) capabilities. Built with PyTorch and optimized for minimal hardware requirements without sacrificing accuracy.

โœจ Key Features

  • Efficient Architecture: 168M parameters with Grouped-Query Attention (GQA)
  • Memory Optimized: Mixed precision training (FP16), gradient accumulation
  • Low Resource Usage: Runs on 4GB+ GPU or CPU
  • RAG Support: Answer questions based on your documents (PDF, TXT, DOCX)
  • Multiple Datasets: OpenAssistant, Dolly, Alpaca, TinyStories, Code datasets
  • Advanced Components:
    • Rotary Positional Embeddings (RoPE)
    • SwiGLU activation
    • RMSNorm layer normalization
    • Flash Attention support
  • Interactive CLI: Simple command-line chatbot interface with document Q&A

๐Ÿš€ Quick Start

1. Install Dependencies

pip install -r requirements.txt

Optional (for better RAG):

pip install sentence-transformers PyPDF2 python-docx

2. Train the Model

python train.py

Now trains on OpenAssistant (high-quality conversations).
Training takes ~2-3 hours on GTX 1650 (4GB).

3. Chat with Your Bot

Regular Chat:

python chatbot.py

RAG Chat (Document Q&A):

python chatbot_rag.py

With Document Pre-loaded:

python chatbot_rag.py --document sample_document.txt

๐Ÿ“ Project Structure

Custom_LLM/
โ”œโ”€โ”€ LLM_architecture_168.py   # Core model architecture
โ”œโ”€โ”€ train.py                   # Training script (OpenAssistant)
โ”œโ”€โ”€ chatbot.py                 # Simple CLI chatbot
โ”œโ”€โ”€ chatbot_rag.py            # RAG-enabled chatbot โญ NEW
โ”œโ”€โ”€ rag_pipeline.py           # RAG implementation โญ NEW
โ”œโ”€โ”€ tokenizer_utils.py         # Tokenization utilities
โ”œโ”€โ”€ data_loader.py             # Multi-dataset loader (updated)
โ”œโ”€โ”€ config.json                # Model configuration (GTX 1650 optimized)
โ”œโ”€โ”€ requirements.txt           # Python dependencies
โ”œโ”€โ”€ sample_document.txt        # Example document for RAG
โ”œโ”€โ”€ test_rag.py               # RAG test script
โ”œโ”€โ”€ quick_start.md            # Detailed setup guide
โ”œโ”€โ”€ RAG_GUIDE.md              # RAG usage guide โญ NEW
โ”œโ”€โ”€ TRAINING_WITH_RAG.md      # Complete implementation guide โญ NEW
โ””โ”€โ”€ checkpoints/              # Saved model checkpoints

๐Ÿ’ป Hardware Requirements

Minimum (Current Configuration)

  • GPU: 4GB VRAM (GTX 1650, GTX 1050 Ti) โญ Optimized
  • RAM: 8GB
  • Storage: 5GB

Recommended

  • GPU: 6GB+ VRAM (GTX 1660, RTX 3050)
  • RAM: 16GB
  • Storage: 10GB

CPU-Only Mode

Supported but slower (training may take 6-8 hours).

๐ŸŽฏ Model Configuration

The model is configured in config.json:

{
  "vocab_size": 50304,
  "dim": 1024,              // Model dimension
  "n_layers": 12,           // Transformer layers
  "n_heads": 16,            // Attention heads
  "n_kv_heads": 4,          // KV heads for GQA
  "hidden_dim": 2816,       // FFN hidden size
  "max_seq_len": 512,       // Max sequence length
  "batch_size": 4,
  "gradient_accumulation_steps": 8,
  "mixed_precision": true
}

๐Ÿ“Š Training Features

Memory Optimizations

  • Mixed Precision (FP16): Reduces memory by 50%
  • Gradient Accumulation: Simulates larger batches
  • Weight Tying: Shares embedding weights
  • Efficient Attention: Grouped-Query Attention

Training Techniques

  • AdamW optimizer with weight decay
  • Cosine learning rate schedule with warmup
  • Gradient clipping for stability
  • Automatic checkpointing

๐ŸŽฎ Using the Chatbot

Regular Chatbot (chatbot.py)

You: Hello!
Bot: Hi! How can I help you today?

Commands:
  - 'quit' or 'exit': Exit the chatbot
  - 'clear': Clear conversation history
  - 'history': Show conversation history

RAG Chatbot (chatbot_rag.py) โญ NEW

You: add sample_document.txt
โœ“ Document added: sample_document.txt

You: What is machine learning?
Bot: Based on the document, machine learning is a subset of AI...

Commands:
  - 'add <filepath>': Add document to knowledge base
  - 'docs': List loaded documents
  - 'save kb': Save knowledge base
  - 'load kb': Load knowledge base
  - 'quit', 'clear', 'history': Same as regular chatbot

Generation Parameters

Adjust in chatbot.py:

  • temperature: 0.7-1.0 (higher = more creative)
  • top_k: 40-50 (smaller = more focused)
  • top_p: 0.9-0.95 (nucleus sampling)
  • max_new_tokens: 50-200 (response length)

๐Ÿ“š Datasets

Default: OpenAssistant โญ NEW

  • High-quality human conversations
  • 161K samples
  • Instruction-following format
  • Best for chatbots

Alternatives (in train.py):

  • Dolly-15k: Instruction following (fast training)
  • Alpaca: Q&A format (good quality)
  • TinyStories: Simple text (basic testing)
  • Code Search Net: Python code (code tasks)
# In train.py, change dataset_name
dataset_name='oasst'          # OpenAssistant (default)
dataset_name='dolly'          # Dolly-15k
dataset_name='alpaca'         # Alpaca
dataset_name='code_search_net' # Code

Custom Data

Add your own training data in data_loader.py.

๐Ÿ”ง Troubleshooting

Out of Memory

  1. Reduce batch size to 2 in config.json
  2. Reduce max_seq_len to 256
  3. Reduce model size (fewer layers/smaller dim)

Poor Responses

  1. Train longer (more epochs)
  2. Use more training data
  3. Adjust generation temperature

Installation Issues

# Install PyTorch first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Then install other dependencies
pip install transformers datasets tqdm accelerate

๐Ÿ“ˆ Performance

Training Speed

  • GPU (RTX 3060): ~40 min for 3 epochs (10K samples)
  • GPU (GTX 1060): ~60 min for 3 epochs
  • CPU: 4-6 hours

Model Size

  • Parameters: 168M (0.17B)
  • Disk size: ~650MB (FP32), ~325MB (FP16)
  • Memory usage: ~1-2GB during inference

๐Ÿ› ๏ธ Advanced Usage

Fine-tuning

python train.py --checkpoint checkpoints/best_model.pt

Inference Only

from chatbot import Chatbot

bot = Chatbot(model_path='checkpoints/best_model.pt')
response = bot.generate("Your prompt here")

Export Model

# Save only weights
torch.save(model.state_dict(), 'model_weights.pt')

๐Ÿ“– Documentation

  • TRAINING_WITH_RAG.md - Complete implementation guide (START HERE) โญ
  • RAG_GUIDE.md - RAG usage and best practices
  • quick_start.md - Detailed setup instructions
  • IMPLEMENTATION_GUIDE.md - Architecture details
  • QUICK_REFERENCE.txt - Command reference card

๐Ÿค Contributing

Feel free to submit issues and enhancement requests!

๐Ÿ“„ License

See LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Architecture inspired by modern LLMs (Llama, Mistral)
  • Optimizations from Flash Attention and efficient transformers research
  • Training techniques from various open-source projects

Built with โค๏ธ for efficient AI custom LLM

in development... stay tuned homies!

About

Implementation of a LLM in pytorch (200M param)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published