Common questions and answers about SecMask.
- General Questions
- Technical Questions
- Usage Questions
- Deployment Questions
- Privacy & Security
- Performance
- Troubleshooting
SecMask is a Mixture of Experts (MoE) system for detecting and masking secrets (API keys, tokens, credentials) in text. It uses two specialized NER models:
- Fast Expert: DistilBERT-based, handles 92.7% of cases in ~6ms
- Long Expert: Longformer-based, handles complex cases requiring up to 2048 tokens
Regex limitations:
- Brittle patterns that break with minor variations
- High false positive rates
- Can't handle context-dependent secrets
- Requires constant maintenance
SecMask advantages:
- ML-based detection learns patterns from data
- Low false positive rate (82% precision, production-safe)
- Handles context (distinguishes real secrets from examples)
- Automatically adapts to new secret formats
- Multi-stage pipeline (NER + deterministic filters)
SecMask detects:
- API keys (OpenAI, Stripe, SendGrid, etc.)
- Cloud credentials (AWS, Azure, GCP)
- GitHub tokens (classic, fine-grained, PATs)
- JWT tokens
- SSH/PEM keys
- Database connection strings
- Kubernetes secrets
- And more...
See BENCHMARKS.md for detailed detection rates.
Yes! SecMask is released under dual licensing:
- SecMask codebase: MIT License (training scripts, inference code, documentation)
- Fine-tuned models: Apache 2.0 (inherited from DistilBERT and Longformer base models)
You can:
- ✅ Use freely in commercial projects
- ✅ Modify and redistribute
- ✅ Contribute improvements
- ✅ Fine-tune for your own use cases
Attribution required for:
- DistilBERT base model (© Hugging Face, Apache 2.0)
- Longformer base model (© Allen Institute for AI, Apache 2.0)
See LICENSE and NOTICE for full details.
On our test set (600 examples) at τ=0.80 threshold:
- F1 Score: 0.52 (NER model only)
- Precision: 82% (low false positives, production-safe)
- Recall: 38% (NER component)
Production Note: These metrics represent the NER models alone. Production deployments combine NER detection with deterministic filters (PEM blocks, K8s secrets, pattern-based matching) for comprehensive secret coverage while maintaining the high precision guarantee.
See BENCHMARKS.md for detailed metrics and evaluation methodology.
MoE is an architecture where multiple specialized models ("experts") handle different types of inputs:
- Router decides which expert to use based on input characteristics
- Fast Expert (DistilBERT, 512 tokens) handles most cases quickly
- Long Expert (Longformer, 2048 tokens) handles complex cases
This gives us both speed (6ms average) and accuracy.
Trade-off between speed and capacity:
| Aspect | Fast Expert | Long Expert |
|---|---|---|
| Latency | 6ms | 12ms |
| Max tokens | 512 | 2048 |
| Model size | 268MB | 592MB |
| Use case | Single secrets | Long configs |
Using both gives us the best of both worlds:
- 92.7% of texts processed in 6ms (fast expert)
- Complex cases escalated to long expert automatically
The router uses heuristics to decide which expert to use:
def should_escalate(text):
"""Decide if we need long expert"""
# Fast expert if text is short
if len(text.split()) < 100:
return False
# Fast expert if no complex patterns
if not has_multi_line_structure(text):
return False
# Otherwise, use long expert
return TrueSee router.py for full implementation.
Yes! See train_ner_masker.py for the training script.
Requirements:
- Labeled dataset (BIO-tagged secrets)
- GPU (NVIDIA T4 or better)
- ~2 hours training time
Steps:
# Prepare data (see data/README.md)
python data/make_v2_data.py
# Train fast expert
python train_ner_masker.py \
--model-name distilbert-base-uncased \
--train-file data/v2_train.jsonl \
--val-file data/v2_val.jsonl
# Train long expert
python train_longformer_expert.py \
--model-name allenai/longformer-base-4096 \
--train-file data/long_context_train.jsonl \
--val-file data/long_context_val.jsonlFast Expert:
- Base:
distilbert-base-uncased(66M parameters) - Task: Token classification (NER)
- Labels:
O(non-secret),B-SECRET,I-SECRET - Context window: 512 tokens
Long Expert:
- Base:
allenai/longformer-base-4096(149M parameters) - Task: Token classification (NER)
- Labels: Same as fast expert
- Context window: 2048 tokens (4096 max)
Both use standard HuggingFace transformers architecture.
Option 1: Retrain with new data
# Add labeled examples to data/v2_train.jsonl
echo '{"text": "New secret: xyz-123", "labels": ["O", "O", "B-SECRET"]}' >> data/v2_train.jsonl
# Retrain model
python train_ner_masker.py --train-file data/v2_train.jsonlOption 2: Add regex filter
# Add to filters.json
{
"name": "custom_secret",
"pattern": "xyz-[0-9]{3}",
"confidence": 0.95
}
# Apply filter
from filters import apply_filters
masked = apply_filters(text, filters)Quick install:
# Install dependencies
pip install transformers torch
# Download code
git clone https://github.com/andrewandrewsen/secmask.git
cd secmask
# Run
python infer_moe.py --in file.txt \
--fast-model andrewandrewsen/distilbert-secret-maskerSee README.md for detailed instructions.
from infer_moe import mask_text_moe
# Basic usage
masked = mask_text_moe(
"My API key is sk-1234567890",
fast_model_dir="andrewandrewsen/distilbert-secret-masker"
)
print(masked) # "My API key is [SECRET]"See EXAMPLES.md for more examples.
Use the --tau parameter (threshold):
# More sensitive (more detections, more false positives)
python infer_moe.py --in file.txt --tau 0.50
# Less sensitive (fewer false positives, may miss some secrets)
python infer_moe.py --in file.txt --tau 0.90
# Default (balanced)
python infer_moe.py --in file.txt --tau 0.80Recommended thresholds:
- Production logs:
tau=0.85(minimize false positives) - Pre-commit hooks:
tau=0.75(catch more secrets) - Security audits:
tau=0.70(be extra cautious)
Yes! Use a simple loop:
# Bash
for file in *.py; do
python infer_moe.py --in "$file" --out "${file}.masked"
done# Python
from pathlib import Path
from infer_moe import mask_text_moe
for file in Path('.').glob('*.py'):
with open(file, 'r') as f:
content = f.read()
masked = mask_text_moe(content,
fast_model_dir="andrewandrewsen/distilbert-secret-masker")
with open(f"{file}.masked", 'w') as f:
f.write(masked)Option 1: Login via CLI
huggingface-cli login
# Enter your token when promptedOption 2: Environment variable
export HF_TOKEN="hf_xxxxxxxxxxxxx"
python infer_moe.py --in file.txt --fast-model my-org/private-modelOption 3: Pass token directly
python infer_moe.py --in file.txt \
--fast-model my-org/private-model \
--token hf_xxxxxxxxxxxxxYes! SecMask is production-ready. See DEPLOYMENT.md for:
- Docker deployment
- Kubernetes
- AWS Lambda
- Azure Functions
Minimum (CPU-only):
- 2 CPU cores
- 4GB RAM
- ~6-10ms latency per request
Recommended (GPU):
- NVIDIA T4 or better
- 8GB RAM
- ~3-5ms latency per request
See BENCHMARKS.md for details.
Horizontal scaling (multiple instances):
# Kubernetes
kubectl scale deployment secmask --replicas=10
# Docker Swarm
docker service scale secmask=10Vertical scaling (more resources):
resources:
requests:
memory: "8Gi"
cpu: "4000m"See DEPLOYMENT.md for auto-scaling setup.
Yes! See DEPLOYMENT.md for setup.
Key considerations:
- Use container image deployment (not zip)
- Set timeout to 30s
- Set memory to 2048MB
- Pre-download models at build time
Yes, once models are downloaded:
# Download models
python -c "from transformers import AutoModel; \
AutoModel.from_pretrained('andrewandrewsen/distilbert-secret-masker')"
# Now works offline
python infer_moe.py --in file.txt \
--fast-model ~/.cache/huggingface/hub/models--andrewandrewsen--distilbert-secret-maskerNo. SecMask runs entirely locally. Your data never leaves your machine unless you:
- Use HuggingFace Inference API (not recommended)
- Deploy SecMask as a remote service
Yes! SecMask:
- Processes data in-memory only
- Doesn't log secrets (only metadata)
- Doesn't send telemetry
Best practices:
- Run SecMask on-premises
- Use local model storage (not HF cache)
- Review masked output before sharing
SecMask is designed to prevent secret leakage:
- Only logs masked text (not original)
- Doesn't log model predictions
- Safe to enable debug logging
Example log output:
INFO: Masking text (length: 245 chars)
INFO: Masked in 6.3ms, found 2 secrets
DO NOT open public issues for security vulnerabilities.
Instead:
- Email: [security@example.com]
- Include: Description, impact, steps to reproduce
- We'll respond within 48 hours
See SECURITY.md for details.
Latency:
- Fast expert: 6ms (median), 12ms (P99)
- Long expert: 12ms (median), 25ms (P99)
- MoE average: 6.8ms (92.7% use fast expert)
Throughput:
- CPU: ~50 requests/second (single core)
- GPU (T4): ~300 requests/second
- GPU (A100): ~1200 requests/second
See BENCHMARKS.md for detailed metrics.
Model loading overhead:
- First request loads model into memory (~2-5s)
- Subsequent requests reuse loaded model (fast)
Solutions:
- Keep service running (don't restart per request)
- Use model caching
- Pre-load models at startup
# Pre-load models
from infer_moe import load_model
model, tokenizer = load_model("andrewandrewsen/distilbert-secret-masker")
# Now fast for all requests1. Use fast-only mode (no escalation):
python infer_moe.py --in file.txt --no-escalate
# 2x faster, slight accuracy loss2. Use GPU:
# Automatic if available
python infer_moe.py --in file.txt3. Batch processing:
from transformers import pipeline
pipe = pipeline("token-classification",
model="andrewandrewsen/distilbert-secret-masker",
batch_size=16) # Process 16 at once
results = pipe(texts) # Much faster4. ONNX conversion:
# Convert to ONNX (2-3x faster)
python -m optimum.onnxruntime \
--model andrewandrewsen/distilbert-secret-masker \
--export onnxSee DEPLOYMENT.md for more.
Model sizes:
- Fast expert: 268MB
- Long expert: 592MB
- Runtime: +500MB (tokenizer, inference)
Total:
- Fast-only: ~800MB
- MoE (both): ~1.4GB
GPU adds VRAM overhead (~1GB).
Cause: Model not downloaded or incorrect path.
Solution:
# Download model
python -c "from transformers import AutoModel; \
AutoModel.from_pretrained('andrewandrewsen/distilbert-secret-masker')"
# Use full HuggingFace ID
python infer_moe.py --fast-model andrewandrewsen/distilbert-secret-maskerCause: GPU VRAM insufficient.
Solution:
# Use CPU
export CUDA_VISIBLE_DEVICES=""
python infer_moe.py --in file.txt
# Or reduce batch size (if using batching)
pipe = pipeline(..., batch_size=1)Common causes:
- Hex strings (e.g.,
#1a2b3c, git commits) - UUIDs (e.g.,
123e4567-e89b-12d3-a456-426614174000) - Hashes (e.g., MD5, SHA256)
Solutions:
# Increase threshold (fewer false positives)
python infer_moe.py --tau 0.90
# Add custom filters to whitelist patterns
# Edit filters.jsonCommon causes:
- New secret format not in training data
- Obfuscated secrets (e.g., base64 encoded)
- Very long secrets (>512 tokens for fast expert)
Solutions:
# Decrease threshold (more detections)
python infer_moe.py --tau 0.70
# Enable long expert
python infer_moe.py \
--fast-model andrewandrewsen/distilbert-secret-masker \
--long-model andrewandrewsen/longformer-secret-masker
# Retrain with new examples
python train_ner_masker.py --train-file data/custom_train.jsonlQuick wins:
- Use fast-only mode:
--no-escalate - Use GPU if available
- Increase threshold:
--tau 0.85(fewer detections = faster)
Advanced:
- ONNX conversion (2-3x speedup)
- Quantization (smaller model, faster)
- Batch processing (higher throughput)
See Performance section above.
Enable debug logging:
import logging
logging.basicConfig(level=logging.DEBUG)
from infer_moe import mask_text_moe
masked = mask_text_moe(...)Check model loading:
from transformers import AutoModel
try:
model = AutoModel.from_pretrained("andrewandrewsen/distilbert-secret-masker")
print("✅ Model loaded successfully")
except Exception as e:
print(f"❌ Error: {e}")Test inference:
from infer_moe import mask_text_moe
text = "Test: sk-1234567890"
masked = mask_text_moe(text, fast_model_dir="andrewandrewsen/distilbert-secret-masker")
print(f"Input: {text}")
print(f"Output: {masked}")
assert '[SECRET]' in masked, "Secret not detected!"- Documentation: README.md, EXAMPLES.md, DEPLOYMENT.md
- GitHub Issues: Open an issue
- GitHub Discussions: Ask the community
- Email: [your-email@example.com]
Last Updated: 2024-11