This project is to demonstrate how the the Qwen3Guard models work. The 'Stream' model in particular is different becaues llama.cpp can't convert it to GGUF files.
Qwen have provided basic CLI scripts to show how it works, and I have converted them into Ollama-compatible versions, and also a webpage that uses the Stream model for moderation.
These are demonstrations to show how the basic interactions work. The API server examples are based on these.
For the non-Ollama scripts, you need these dependencies:
pip install transformers
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
pip install accelerateThe Ollama versions do not need these.
This project includes an example of fine-tuning the Qwen3-4B model for custom classification tasks. The example demonstrates fine-tuning for Star Trek-related content classification.
The fine-tuning process uses a JSONL (JSON Lines) dataset format where each line contains:
{"input": "Your text input here", "label": "related"}The dataset should be placed in finetuning/star_trek/star_trek_guard_dataset.jsonl. The labels are binary classification: "related" or "not_related".
You can generate a Star Trek dataset using the provided script:
python generate_star_trek_questions.pyThis will create a dataset file with 2,500 Star Trek-related questions labeled as "related". You can modify this script to generate datasets for your own domain.
The fine-tuning script (finetuning/star_trek/train_star_trek_guard.py) uses:
- Base Model: Qwen3-4B
- Method: LoRA (Low-Rank Adaptation) for efficient fine-tuning
- Training Parameters:
- Batch size: 2
- Gradient accumulation: 16 (effective batch size: 32)
- Epochs: 3
- Learning rate: 2e-4
- Max sequence length: 512
- LoRA Configuration:
- Rank (r): 16
- Alpha: 32
- Dropout: 0.05
- Target modules: attention and MLP layers
- Install additional dependencies:
pip install datasets peft- Navigate to the fine-tuning directory:
cd finetuning/star_trek-
Ensure your dataset file (
star_trek_guard_dataset.jsonl) is in the current directory -
Run the training script:
python train_star_trek_guard.pyThe fine-tuned model will be saved to ./star_trek_guard_finetuned/ directory. The script will:
- Load and tokenize the dataset
- Split into train/test sets (90/10)
- Apply LoRA fine-tuning
- Save the model and tokenizer
- Run test predictions on sample inputs
To fine-tune for your own use case:
- Modify the
MODEL_NAMEvariable to use a different base model - Update
LABEL2IDandID2LABELdictionaries for your classification labels - Adjust training hyperparameters (batch size, learning rate, epochs) based on your dataset size and hardware
- Create your own dataset in the JSONL format
After fine-tuning, you can upload your model to Hugging Face Hub for easy sharing and deployment.
- Install the Hugging Face Hub library:
pip install huggingface_hub- Authenticate with Hugging Face:
huggingface-cli loginEnter your Hugging Face token when prompted. You can get a token from https://huggingface.co/settings/tokens
- Navigate to the fine-tuning directory:
cd finetuning/star_trek- Edit
huggingface_upload.pyto set your repository ID:
repo_id = f"your-username/your-model-name"-
Ensure the model directory path matches your training output directory (default:
./star_trek_guard_finetuned) -
Run the upload script:
python huggingface_upload.pyThe script will:
- Create a new repository on Hugging Face (or use existing if
exist_ok=True) - Upload all model files, tokenizer files, and configuration files
- Make the model publicly available (or private if
private=True)
After upload, your model will be available at https://huggingface.co/your-username/your-model-name and can be loaded using:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("your-username/your-model-name")
tokenizer = AutoTokenizer.from_pretrained("your-username/your-model-name")The qwen-stream-api_server.py provides a complete backend API for the Qwen3Guard-Stream model that can be used with the qwen_stream_chat.html interface.
The Star Trek classification examples show how to use the fine-tuned Star Trek example. This is NOT a stream model so it doesn't use the same interface as the standard qwen-steam-api_server.py example.
You can connect to it via the star_trek_chat.html interface and the star_trek_api_server.py server.
- Install the required dependencies:
pip install flask
pip install flask_cors
pip install accelerate- Start the API server:
python qwen_stream_api_server.pyOr use the Star Trek version if required.
The server will start on http://localhost:5000 by default.
- Open
qwen_stream_chat.htmlin a web browser (you may need to serve it from a local web server to avoid CORS issues):
python -m http.server 8000Then navigate to http://localhost:8000/qwen_stream_chat.html
- The HTML interface will automatically connect to the API server at
http://localhost:5000/api/chat
-
POST /api/moderate- Main chat endpoint that moderates user messages (and optionally assistant messages)- Accepts:
{"messages": [{"role": "user", "content": "..."}], "stream": true} - Returns: Streaming JSON responses with moderation results
- Accepts:
-
GET /health- Health check endpoint- Returns:
{"status": "healthy", "model_loaded": true/false}
- Returns: