Conversation
Member
|
If this is meant to be a replacement to the deduper, I require that this is tested extensively to ensure there are no false-negatives. I'd rather have 10 dupe reports, 1 of which is correct, than less reports but duplicate images happily live on site. |
Contributor
|
I can see that it can successfully detect mirrored images and variations of the same image. Can it detect slightly cropped images, images with different brightness levels, etc? How does it perform on very thin and tall (webtoon-manhwa-like comics) images? |
Contributor
Author
Member
|
Superseded by #627 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


This PR replaces the old "image intensities" reverse image search, and has come about due to the confluence of several key factors within the past year:
Together, these factors are used to implement a reverse image search system that uses semantic meaning in the images to identify them, rather than their overall appearance. To establish what is meant by this, here are some examples of an original image and matches found when executing on Derpibooru:
The fact that DINOv2 has semantic extraction can be determined through generated attention maps for these images. The code to generate these attention maps can be found in this repository. These have been reprocessed at a higher scale for visibility:
The system works as follows:
Indexing the classification vector using a nested field allows for the possibility of extracting multiple vectors from each image, and the database table has been set up to allow this should it be desired in the future.
I have pre-computed the DINOv2 with registers features for ~3.5M images on Derpibooru, ~400K images on Furbooru, and ~35K images on Tantabus. Batch inference was run on a 3060 Ti using code from this repository, with the entire process heavily bottlenecked by memory copy bandwidth and image decode performance rather than the GPU execution itself. However, the inference code is efficient enough to run on a CPU in less than 0.5 seconds per image, and this is what is implemented in the repository (with the expectation that there will be no GPU requirement on the server).
This PR must not be merged until OpenSearch releases version 2.19, as 2.18 contains a critical bug that prevents the system from working in all cases. Other bugs
relating to filtering may or may not also be fixed in the 2.19 release, but have been worked around for now.
This PR must also not be merged until its dependents #389 and #400 are merged.
Fixes #331 (method outdated)