Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 0 additions & 8 deletions .env.example

This file was deleted.

5 changes: 3 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,13 @@ RUN pip install --no-cache /wheels/*
# Copy application code
COPY --chown=user openenv.yaml .
COPY --chown=user my_env.py .
COPY --chown=user Inference.py .
COPY --chown=user inference.py .
COPY --chown=user app.py .
COPY --chown=user graders/ ./graders/
COPY --chown=user data/ ./data/

# Healthcheck to verify the app can start
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import my_env; print(1)" || exit 1

CMD ["python", "Inference.py"]
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ cd Moderix
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python Inference.py
python inference.py
```

---
Expand Down Expand Up @@ -119,7 +119,10 @@ export GEMINI_MODEL_NAME="gemini-2.5-flash"

## 📈 Baseline Evaluation

Our standard `Inference.py` baseline utilizes `tenacity` exponential backoff to handle massive inference loads cleanly without rate-limit crashes. Standard LLMs (like `Qwen2.5` or `gpt-4o-mini`) generally score between **0.45 and 0.75**, proving the environment is solvable but strictly penalizes hallucinations and overconfidence.
Our standard `inference.py` baseline utilizes `tenacity` exponential backoff to handle massive inference loads cleanly without rate-limit crashes. Standard LLMs (like `Qwen2.5` or `gpt-4o-mini`) generally score between **0.45 and 0.75**, proving the environment is solvable but strictly penalizes hallucinations and overconfidence.

**Verified Baseline Score:**
Running `inference.py` with the **`gemini-2.5-flash`** model yields a consistent baseline average reward of **0.79 / 1.0**. The agent reliably demonstrates the ability to detect toxicity (Easy), classify spam (Medium), and categorize complex NSFW context (Hard) across the full episode.

Run the test suite to locally verify the mathematical bounds of our reward engine:
```bash
Expand Down Expand Up @@ -152,7 +155,8 @@ Every push and pull request triggers our `.github/workflows/ci.yml` pipeline:
Moderix/
├── README.md # Environment documentation (this file)
├── my_env.py # Core stateful Environment class
├── Inference.py # Automated inference loop w/ exponential backoff
├── inference.py # Automated inference loop w/ exponential backoff
├── app.py # API Web Server for Hugging Face Spaces ping
├── Dockerfile # Multi-stage, non-root HF Spaces container
├── requirements.txt # Dependencies (incl. sentence-transformers)
├── openenv.yaml # OpenEnv compliance and config file
Expand Down
53 changes: 53 additions & 0 deletions app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import asyncio
from fastapi import FastAPI, HTTPException
import my_env
from pydantic import BaseModel

app = FastAPI(title="Content Moderation OpenEnv API")

# Global environment instance
environment: my_env.ContentModerationEnv = None

@app.on_event("startup")
async def startup_event():
global environment
# Initialize the Content Moderation RL environment exactly as Inference.py does
environment = await my_env.ContentModerationEnv.from_env()

@app.get("/")
async def health_check():
"""Hugging Face Spaces automated ping endpoint - must return 200"""
if environment is not None:
return {"status": "healthy", "message": "Environment is ready"}
raise HTTPException(status_code=503, detail="Environment initializing...")

@app.post("/reset")
async def reset_env():
"""Reset the environment state for a new episode"""
try:
obs = await environment.reset()
return obs.model_dump()
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

@app.post("/step")
async def step_env(action: my_env.Action):
"""Process a single step decision"""
try:
obs, reward, done, info = await environment.step(action)
return {
"observation": obs.model_dump(),
"reward": reward.model_dump(),
"done": done,
"info": info
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

@app.get("/state")
async def state_env():
"""Get the current environment state tracking"""
try:
return await environment.state()
except getattr(Exception, "dummy", Exception) as e:
raise HTTPException(status_code=500, detail=str(e))
File renamed without changes.
3 changes: 3 additions & 0 deletions my_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,9 @@ def _grade_decision(self, action: Action, post: dict) -> float:
if decision == "approve" and is_truly_malicious and confidence > 0.8:
final_reward = 0.0

# Apply trajectory penalty based on ruined reputation (min 0.2 to prevent total lock-out)
final_reward = final_reward * max(0.2, self.user_reputation)

return min(max(final_reward, 0.0), 1.0)

async def state(self) -> dict:
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,5 @@ openai>=1.0.0

tenacity>=8.2.0
sentence-transformers>=3.0.0
fastapi>=0.100.0
uvicorn>=0.20.0
Loading