-
Notifications
You must be signed in to change notification settings - Fork 0
Introduction Multidimensional
This project focuses on the development of a content-based retrieval system that works across three data modalities: text, images, and audio.
-
Text
We use the MPST Movie Plot Synopses dataset from Kaggle, which contains thousands of movie synopses annotated with genres and tags. This dataset is ideal for testing content-based retrieval in the textual domain since it allows us to evaluate semantic similarity between descriptions. -
Images
We use the Fashion Product Images Dataset (~25GB), which contains labeled images of clothing products. Each image is processed using SIFT descriptors and BoVW histograms to perform visual similarity search. -
Audio
We use a curated subset of Spotify Songs in.wavformat, including various artists and genres. We extract MFCC features to characterize the audio and build histograms of acoustic words using KMeans clustering.
Traditional retrieval systems often rely on metadata or keyword matching. However, content-based retrieval requires understanding the actual content (visual, auditory, or textual). Each data domain has unique properties:
- Text requires natural language processing and inverted indexes.
- Images require local feature extraction and visual vocabularies.
- Audio requires signal processing and acoustic descriptors.
To unify these under a single framework, we need a multimodal database that allows:
- Indexing and retrieving content from different modalities
- Using specialized pipelines for each topic