Hybrid 916-feature artwork detector with two-stage anime veto#56
Closed
wfproc wants to merge 2 commits intodarkshapes:mainfrom
Closed
Hybrid 916-feature artwork detector with two-stage anime veto#56wfproc wants to merge 2 commits intodarkshapes:mainfrom
wfproc wants to merge 2 commits intodarkshapes:mainfrom
Conversation
Confirms that CLIP-based detection is biased toward generators that use CLIP internally. Tested on Defactify MS-COCOAI dataset (96K images, 5 labeled generators, semantically matched captions): Generator Uses CLIP? Hand-crafted CLIP Delta SD 2.1 YES 86.5% 96.1% +9.6pp SDXL YES 93.5% 99.0% +5.5pp SD 3 YES 85.4% 97.5% +12.1pp Midjourney v6 Unknown 88.5% 99.5% +11.0pp DALL-E 3 NO 98.7% 98.2% -0.5pp CLIP advantage on CLIP generators: +9.1pp average CLIP advantage on non-CLIP generators: -0.5pp (hand-crafted wins) Replaces per-experiment PDFs with single consolidated research report (negate_research_report.pdf) covering all experiments, scaling analysis, CLIP bias findings, and recommended next steps.
+ 156 handcrafted features (was 49) + 768 frozen ConvNeXt-Tiny + fine-tuned ConvNeXt anime veto model + 3-model ensemble (LightGBM + SVM + RF) with calibrated 3-class output + pause/resume feature extraction cache system ~ feature_artwork.py expanded with Gabor, wavelets, fractal, JPEG ghost, mid-band frequency, patch consistency, linework analysis - removed dead-end test scripts and outdated results from PRs darkshapes#51/darkshapes#52
Member
|
Anime Side-Quest |
Member
|
moved to #57 , previous pruning merge conflicting |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Negate Artwork Detection — Research & Experiments
How to Read This Document
What experiments were run: Each section documents a specific test with exact datasets, sample sizes, and results. Links to HuggingFace datasets and test scripts are provided throughout.
What solutions we moved toward: We started with 49 handcrafted features (Li & Stamp 2025) and iterated through feature expansion, learned features (ConvNeXt), ensemble methods, threshold calibration, and finally a two-stage architecture with an anime-specific veto model.
Why these directions: The existing negate codebase uses VIT/VAE feature extraction + PCA + XGBoost, achieving ~63% accuracy. We needed to:
How this addresses issues in existing code:
Where does the data come from? All training and test data comes from publicly available HuggingFace datasets or the public CivitAI API. Every source is linked below with its license. Genuine (real art) sources are WikiArt (public domain classical art via ImagiNet, CC BY 4.0), tellif (curated real images, HF public), Hemg (HF public), and latentcat/animesfw (anime illustration archive, HF public). No private or proprietary data is used. CivitAI images are scraped from the public API and not redistributed — users should scrape their own.
Two-Stage Architecture
The anime veto is a ConvNeXt-Tiny with stage 3 fine-tuned on anime real vs AI data. It only acts on images the main ensemble flags as SYNTHETIC — it can downgrade to UNCERTAIN but never upgrade to SYNTHETIC. This eliminates most false positives on anime/illustration content without affecting detection of AI art.
Code:
negate/extract/feature_artwork.py(148 HC),negate/extract/feature_learned.py(768 ConvNeXt),models/convnext_anime_finetuned.pt(anime veto)Feature Architecture
Handcrafted Features (148 dimensions)
All features operate on a 255x255 resized image, CPU-only. Grouped by source:
Removed (hurt accuracy): Color coherence vectors (-0.8pp), cross-subband wavelet correlation (-1.0pp).
Learned Features (768 dimensions)
Frozen ConvNeXt-Tiny (ImageNet-22K pretrained,
timmlibrary). Penultimate layer = 768-dim embedding.Feature Importance
On 6,344 training images (5-fold CV): 82.6% learned / 17.4% handcrafted
Top features:
high_to_mid_ratio(mid-band freq),mslbp_s3_var(coarse LBP),hsv_entropy,jpeg_ghost_q90_rmse,wavelet_hh_energy_ratio,blend_saturation_dip(paint mixing).Ensemble
Thresholds: GENUINE < 0.35 | UNCERTAIN | SYNTHETIC >= 0.65. Selected from probability distribution analysis — real art max prob = 0.494, safety margin of 0.156 to SYNTHETIC threshold.
Verified Results (April 6, 2026)
All numbers from a single end-to-end verification run.
Detection — Large n (OpenFake held-out parquets 5-7)
ComplexDataLab/OpenFakeDetection — Dedicated Datasets (n=200)
Rapidata/Recraft-v3-24-7-25_t2i_human_preferenceComplexDataLab/OpenFakeDetection — 2025 Blind Generators (tellif, n=9-20)
Small n — directional only (±33% CI at n=9).
tellif/ai_vs_real_image_semantically_similarFalse Positive Rates
tellif/...latentcat/animesfwPlatform Robustness (simulated)
Large-Scale Validation
Perturbation Ablation (HC vs ConvNeXt)
HC features fragile under noise/resize. ConvNeXt robust. Combined is best.
UNCERTAIN Calibration
No AI image ever classified as GENUINE.
CLIP Bias (Proven)
Source:
Rajarshi-Roy-research/Defactify_Image_Dataset(96K images)Probability Calibration
Training Data
Total: 6,344 balanced (3,172 real + 3,172 synthetic)
Real (Genuine) Sources
delyanboychev/imaginet(extracted wikiart/)tellif/ai_vs_real_image_semantically_similarHemg/AI-Generated-vs-Real-Images-Datasetslatentcat/animesfwAll genuine sources are from established public archives. WikiArt contains public domain classical art from museum collections. latentcat/animesfw sources from anime illustration archives with community moderation. No images were generated or fabricated for the genuine class.
Synthetic (AI-Generated) Sources
delyanboychev/imaginetash12321/seedream-4.5-generated-2kexdysa/nano-banana-pro-generated-1k-cloneLukasT9/Flux-1-Dev-Images-1kLukasT9/Flux-1-Schnell-Images-1kbitmind/nano-bananaCivitAI Scraped Data
Scraped via public CivitAI REST API (
/api/v1/images), filtered by tag, sorted newest. Rate-limited. All images publicly posted by users. Not published as a dataset — users should scrape their own or use the HuggingFace datasets above.Additional scrapes (used for testing, not primary training): sd3 (500), sd35 (500), recraft (500), gemini (500). Local path:
.datasets/civitai/.Test-Only Data
tellif/ai_vs_real_image_semantically_similarComplexDataLab/OpenFakeRapidata/Recraft-v3-24-7-25_t2i_human_preferenceRajarshi-Roy-research/Defactify_Image_DatasetHemg/AI-Generated-vs-Real-Images-Datasetslatentcat/animesfwDead Ends
Anime Side-Quest
The generalization trap: training on WikiArt (classical) as real + CivitAI anime AI as fake taught the model "anime = AI."
Scope & Limitations
Deployment Notes
Reproduce
Promising Directions & Blockers
What seems promising
Current blockers
Fuel
During this research (Mar 26-April 6, 2026), the following were consumed:
References