OVERVIEW
DRV (Defect-Risk Visualization) is a framework that helps developers see and prioritize defect-prone code modules. It combines defect prediction models with interactive visualizations so teams can quickly identify which parts of the codebase are most risky.
Objective
-Integrate software defect prediction (metrics + semantic models) into a unified workflow.
-Provide visual analytics (heatmaps, rankings, dashboards) to make predictions transparent and interpretable.
-Support developers in prioritizing high-risk modules, rather than treating all code equally.
-Deliver a practical tool that complements existing defect prediction research with developer-friendly insights.
This is a self-contained Python scaffold for DRV thesis project. It includes:
- ApacheJIT_Total Dataset
- Baseline models (metrics-only and fused semantic+metrics)
- Ranking metrics (FPA, Recall@K, NDCG@K)
- Visualization (risk heatmap, Top-K bar chart)
- CLI entrypoint
Note: Semantic representation uses TF‑IDF of commit messages as a stand-in for BiCC-BERT embeddings (offline-friendly). Replace with your BiCC module if available.
# From this folder:
python -m drv.cli --commits data/commits.csv --mode train --output outputsArtifacts:
outputs/train_metrics.json— F1/AUC for metrics-only and fused modelsoutputs/ranking_eval.json— FPA/Recall@K/NDCG@Koutputs/scored_commits.csv— risk scores for each commitoutputs/heatmap.png,outputs/topk.png— example visualizations
drv/config.py— config dataclassdrv/data.py— CSV loaders/saversdrv/features.py— CK feature handling + fusiondrv/models.py— training and scoring (GradientBoosting as a local stand-in)drv/ranking.py— FPA, Recall@K, NDCG@Kdrv/eval.py— F1, AUC helpersdrv/visualize.py— heatmap & top-k plotdrv/cli.py— command-line workflow