This project allows you to search images using natural language queries by leveraging the power of OpenAI's CLIP model. It loads a folder of images, generates embeddings, and then finds the most similar images to your text description.
Built for use in Google Colab or local Python environments.
CLIP (Contrastive LanguageβImage Pretraining) is a powerful vision-language model that understands both images and text. This project uses CLIP to extract image embeddings and compare them with text queries using cosine similarity.
-
Load CLIP Model
Loadsopenai/clip-vit-base-patch32via π€ Hugging Face. -
Load Images
Scans theimages/directory and loads.jpgfiles. -
Extract Embeddings
Images are processed and converted into feature vectors. -
Search
User types a natural language query. Top-3 most similar images are returned with similarity scores.
Loading CLIP model and processor...
CLIP model loaded successfully!
Loading images from 'images'...
Loaded: 310715139\_7f05468042.jpg
Loaded: 311619377\_2ba3b36675.jpg
...
Extracting image features (embeddings)... This might take a moment.
Image features extracted.
Enter your search query (e.g., 'a cat in a garden', 'a fast car', or 'exit' to quit): image with black dog
Top 3 results for 'image with black dog':
* images/333973142\_abcd151002.jpg (Similarity: 0.2749)
* images/326456451\_effadbbe49.jpg (Similarity: 0.2684)
* images/3101796900\_59c15e0edc.jpg (Similarity: 0.2520)
Install dependencies with:
pip install torch torchvision torchaudio
pip install transformers
pip install pillowproject/
β
βββ main.py # Main search script
βββ images/ # Folder containing all images
β βββ image1.jpg
β βββ image2.jpg
β βββ ...
βββ README.md
- Clone the repo or upload your images to a folder named
images/. - Run the script and wait for embeddings to be generated.
- Type any text prompt (like "a dog on grass") and view the results.
- Type
'exit'to quit the loop.
- "a man riding a bicycle"
- "a dog laying on the ground"
- "a child playing with a toy"
- "a group of people walking"
This project is for educational and research purposes only. CLIP is provided under the license of OpenAI and Hugging Face.