Dating App Moderation That Actually Works

What I Built

AI moderation system for dating apps that doesn't flag "Want to grab coffee?" as harassment.

Most moderation tools treat dating apps like Twitter - zero tolerance, immediate bans. I built something that understands dating conversations are different.

Key features:

Progressive warnings instead of instant bans
"You're beautiful" scored as appropriate, not harassment
Crisis intervention for self-harm (support, not punishment)
Dual system: gentle handling for normal chat, escalation for real threats
Interactive Streamlit demo with professional UI
Robust error handling for AI safety filters and edge cases
Consistent output parsing with graceful fallbacks

Why It Matters

Dating apps lose users when moderation is too aggressive. Over-moderation kills engagement.

The business problem:

False positives frustrate users into leaving
Support tickets flood in from wrongly banned users
Appeal processes waste time and money

Dating app context matters:

"You're hot" between matched users isn't harassment
Phone number requests are normal after conversation builds
Hook-up language should score 2-3, not 8-9

How It Works

Built two different prompts that route automatically:

Normal conversations → Gentle scoring with progressive enforcement Serious issues → Immediate escalation (hate speech, self-harm, fraud)

Evaluation process:

Tested on 45+ real dating app messages
Manual scoring to find false positive patterns
Langfuse tracking for every decision
Systematic prompt improvements based on failures

Crisis handling: Self-harm detection doesn't remove content - it provides mental health resources and notifies appropriate support teams.

Interactive Demo

Professional Streamlit interface with:

Clean two-column layout with proper spacing
Quick test buttons for common scenarios (Hate Speech, Self-Harm, Fraud)
Loading states and success feedback
Expandable technical analysis view
Mobile-friendly responsive design

Quick tests available:

Hate speech detection and scoring
Self-harm crisis intervention
Fraud/scam identification
Normal dating conversation handling

Example Outputs

Boundary-Pushing Content Analysis:

Crisis Intervention for Self-Harm:

Hate Speech Detection:

Setup & Usage

Prerequisites

Python 3.8+
OpenAI API key
Langfuse account (for tracking)

Core Files

hinge_moderation_v2.py - Main moderation engine
web_demo.py - Streamlit interface
hinge-terms-of-use.txt - Reference guidelines

Results from Testing

Fixed AI output consistency issues - Resolved parsing errors and format inconsistencies
Optimized token usage - Reduced specialized prompts from 10,177 to ~744 tokens (93% reduction)
Enhanced safety filter handling - Graceful responses when OpenAI safety systems trigger
Improved hate speech detection - Enhanced keyword detection for more accurate routing
Reduced false positives in severity scoring for dating app contexts
Progressive enforcement maintains safety while improving user experience
Crisis intervention provides support rather than punishment for self-harm content

Technical Architecture

GPT-4 dual-prompt system for routing and analysis
Optimized prompt engineering with token limit management
Robust safety filter detection for OpenAI content policy triggers
Langfuse integration for observability and improvement tracking
Streamlit frontend with professional UI/UX and visual examples
Consistent output parsing with improved format handling
Session state management for interactive testing

🚀 Coming Soon

RAG Implementation

Policy Knowledge Base: Vector database of dating app guidelines and precedents
Context-Aware Decisions: Retrieve relevant policy examples for consistent enforcement
Appeal Case History: Learn from previous moderation decisions and outcomes
Dynamic Policy Updates: Automatically incorporate new guidelines without code changes

Structured Outputs

JSON Schema Validation: Guaranteed consistent API responses for production integration
Typed Moderation Results: Structured data for downstream systems and analytics
Audit Trail Format: Standardized logging for compliance and review processes

Built for AI Product Manager interviews - Demonstrates systematic approach to trust & safety, user experience focus, and technical implementation skills.

Stack: Python, GPT-4, Streamlit, Langfuse

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.venv		.venv
images		images
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md
README_hinge_focused.md		README_hinge_focused.md
README_project_log.md		README_project_log.md
Screenshot 2025-09-29 084252.png		Screenshot 2025-09-29 084252.png
ai-content-mod-example-mvp		ai-content-mod-example-mvp
ai_moderator_v2.py		ai_moderator_v2.py
analyze_langfuse_patterns.py		analyze_langfuse_patterns.py
check_langfuse_api.py		check_langfuse_api.py
evaluation_dataset.json		evaluation_dataset.json
hate-speech-example-simple-response		hate-speech-example-simple-response
hinge-principles.txt		hinge-principles.txt
hinge-terms-of-use.txt		hinge-terms-of-use.txt
hinge_moderation_v2.py		hinge_moderation_v2.py
langfuse_evaluation_guide.py		langfuse_evaluation_guide.py
main.py		main.py
requirements-clean.txt		requirements-clean.txt
requirements.txt		requirements.txt
web_demo.py		web_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dating App Moderation That Actually Works

What I Built

Why It Matters

How It Works