Visual Generative AI Application Portfolio This repository contains a three-part series of projects developed for the Specialist Diploma in Applied Generative AI (SDGAI). The collection explores advanced Generative Adversarial Networks (GANs) and computer vision techniques for object counting.
📁 Project Structure Part 1: Unconditional WGAN-GP
Generates images across 10 distinct classes using a Wasserstein GAN with Gradient Penalty.
Focuses on learning the overall distribution of a dataset without explicit labels during training.
Part 2: Conditional WGAN-GP
Implements a conditional GAN to synthesize specific hand-sign letters.
Features targeted fine-tuning to improve the visual fidelity of "blurry" or low-quality classes.
Part 3: Adaptive Object Counting
A smart, queryable system that counts objects using natural language prompts (e.g., "count vehicles").
Utilizes pre-trained models like YOLOv8 and Grounding DINO + SAM for zero-shot detection and counting.
.
├── ASG_Part1_NathanOngKeeWee.ipynb
├── ASG_Part2_NathanOngKeeWee.ipynb
├── ASG_Part3_NathanOngKeeWee.ipynb
├── requirements.txt
├── README.md
├── Dataset/
│ ├── A/ # Example class folder
│ ├── B/
│ ├── ... # Continues for all 24 static ASL letters (A-Y, excluding J and Z)
│ └── Y/
└── outputs/
📂 Data Setup To run the GAN notebooks (Part 1 & 2), organize your data as follows:
- Create a folder named
Dataset/in the root directory. - Inside
Dataset/, create subfolders namedA,B,C, etc. (Total 24 classes, excluding J and Z). - Place the respective ASL alphabet images into these folders.
🛠️ Setup Instructions To run these notebooks, you will need a Python environment (3.10+ recommended) and a GPU (CUDA) for efficient GAN training.
- Clone the Repository Bash git clone https://github.com/YourUsername/YourRepoName.git cd YourRepoName
- Install Dependencies The projects rely on PyTorch, Torchvision, and Ultralytics (for YOLOv8). You can install the required libraries using:
Bash pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install tqdm numpy scipy matplotlib pillow ultralytics 3. Dataset Preparation Parts 1 & 2: Place your image data in a folder named ./Dataset. The folder should contain subdirectories for each class (e.g., ./Dataset/A/, ./Dataset/B/, etc.).
Part 3: Requires the ultralytics package for YOLOv8 weights and optionally the Grounding DINO weights if using advanced adaptive counting.
🚀 Module Highlights Part 1: Unconditional Image Generation The goal is to generate a grid of 10 representative images, one for each class, from a GAN trained unconditionally.
Challenge: Because the GAN is unconditional, it doesn't guarantee every class will be generated. A separate classifier is used to sort generated images into predicted class folders for selection.
Part 2: Controlled Letter Synthesis This module uses labels during training to allow specific image generation.
Technique: Employs a Lipschitz constraint via Gradient Penalty (GP) for stable training.
Evaluation: Includes FID (Frechet Inception Distance) scores and manual checkpoint selection based on visual clarity.
Part 3: Natural Language Object Counting This module creates a Gradio-based interface where users can upload an image and type what they want to count.
Logic: The system filters YOLOv8 detections based on the user's text prompt and provides an annotated image with the final count.
👤 Author Nathan Ong Kee Wee Developed as part of the Specialist Diploma in Applied Generative AI (SDGAI)