Skip to content

Okja88/Visual-GenAI-Applications

Repository files navigation

Visual Generative AI Application Portfolio This repository contains a three-part series of projects developed for the Specialist Diploma in Applied Generative AI (SDGAI). The collection explores advanced Generative Adversarial Networks (GANs) and computer vision techniques for object counting.

📁 Project Structure Part 1: Unconditional WGAN-GP

Generates images across 10 distinct classes using a Wasserstein GAN with Gradient Penalty.

Focuses on learning the overall distribution of a dataset without explicit labels during training.

Part 2: Conditional WGAN-GP

Implements a conditional GAN to synthesize specific hand-sign letters.

Features targeted fine-tuning to improve the visual fidelity of "blurry" or low-quality classes.

Part 3: Adaptive Object Counting

A smart, queryable system that counts objects using natural language prompts (e.g., "count vehicles").

Utilizes pre-trained models like YOLOv8 and Grounding DINO + SAM for zero-shot detection and counting.

. ├── ASG_Part1_NathanOngKeeWee.ipynb ├── ASG_Part2_NathanOngKeeWee.ipynb ├── ASG_Part3_NathanOngKeeWee.ipynb ├── requirements.txt ├── README.md ├── Dataset/
│ ├── A/ # Example class folder │ ├── B/ │ ├── ... # Continues for all 24 static ASL letters (A-Y, excluding J and Z) │ └── Y/ └── outputs/

📂 Data Setup To run the GAN notebooks (Part 1 & 2), organize your data as follows:

  1. Create a folder named Dataset/ in the root directory.
  2. Inside Dataset/, create subfolders named A, B, C, etc. (Total 24 classes, excluding J and Z).
  3. Place the respective ASL alphabet images into these folders.

🛠️ Setup Instructions To run these notebooks, you will need a Python environment (3.10+ recommended) and a GPU (CUDA) for efficient GAN training.

  1. Clone the Repository Bash git clone https://github.com/YourUsername/YourRepoName.git cd YourRepoName
  2. Install Dependencies The projects rely on PyTorch, Torchvision, and Ultralytics (for YOLOv8). You can install the required libraries using:

Bash pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install tqdm numpy scipy matplotlib pillow ultralytics 3. Dataset Preparation Parts 1 & 2: Place your image data in a folder named ./Dataset. The folder should contain subdirectories for each class (e.g., ./Dataset/A/, ./Dataset/B/, etc.).

Part 3: Requires the ultralytics package for YOLOv8 weights and optionally the Grounding DINO weights if using advanced adaptive counting.

🚀 Module Highlights Part 1: Unconditional Image Generation The goal is to generate a grid of 10 representative images, one for each class, from a GAN trained unconditionally.

Challenge: Because the GAN is unconditional, it doesn't guarantee every class will be generated. A separate classifier is used to sort generated images into predicted class folders for selection.

Part 2: Controlled Letter Synthesis This module uses labels during training to allow specific image generation.

Technique: Employs a Lipschitz constraint via Gradient Penalty (GP) for stable training.

Evaluation: Includes FID (Frechet Inception Distance) scores and manual checkpoint selection based on visual clarity.

Part 3: Natural Language Object Counting This module creates a Gradio-based interface where users can upload an image and type what they want to count.

Logic: The system filters YOLOv8 detections based on the user's text prompt and provides an annotated image with the final count.

👤 Author Nathan Ong Kee Wee Developed as part of the Specialist Diploma in Applied Generative AI (SDGAI)

About

A comprehensive portfolio of Visual Generative AI projects featuring Unconditional & Conditional WGAN-GP for image synthesis and an adaptive object counting system using YOLOv8 and Grounding DINO. Developed for the Specialist Diploma in Applied Generative AI (SDGAI).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors