Introduction Multidimensional

🧠 Introduction

This project focuses on the development of a content-based retrieval system that works across three data modalities: text, images, and audio.

🗂️ Data Domains and Datasets

Text
We use the MPST Movie Plot Synopses dataset from Kaggle, which contains thousands of movie synopses annotated with genres and tags. This dataset is ideal for testing content-based retrieval in the textual domain since it allows us to evaluate semantic similarity between descriptions.
Images
We use the Fashion Product Images Dataset (~25GB), which contains labeled images of clothing products. Each image is processed using SIFT descriptors and BoVW histograms to perform visual similarity search.
Audio
We use a curated subset of Spotify Songs in .wav format, including various artists and genres. We extract MFCC features to characterize the audio and build histograms of acoustic words using KMeans clustering.

🌐 Why a Multimodal Database?

Traditional retrieval systems often rely on metadata or keyword matching. However, content-based retrieval requires understanding the actual content (visual, auditory, or textual). Each data domain has unique properties:

Text requires natural language processing and inverted indexes.
Images require local feature extraction and visual vocabularies.
Audio requires signal processing and acoustic descriptors.

To unify these under a single framework, we need a multimodal database that allows:

Indexing and retrieving content from different modalities
Using specialized pipelines for each topic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction Multidimensional

🧠 Introduction

🗂️ Data Domains and Datasets

🌐 Why a Multimodal Database?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally