RelDDPM

Introduction

This is the source code of the paper Controllable Tabular Data Synthesis Using Diffusion Models

Quick Start

Environment Setup

Before running the code, please make sure your Python version is above 3.7. We recommend running the code under a virtual environment:

conda create -n relddpm_env python=3.8
conda activate relddpm_env

Then install the necessary packages by :

pip install -r requirements.txt

Code Structure

|-- datasets
    |-- minority_class_oversampling # datasets used in minority class oversampling task
    |-- missing_tuple_completion # datasets used in missing tuple completion task
|-- ddpm # the denoise diffusion probabilistic model package
|-- lib_completion # the library used in missing tuple completion task 
|-- lib_oversampling # the library used in minority class oversampling task 
|-- data_utils.py # the class to preprocess the dataset
|-- eval_utils.py # the class to evaluate
|-- eval.py # code of the evaluation
|-- main.py # main code

Run

Minority Class Oversampling.

Run the code to generate synthetic data for minority class oversampling with the following command:

python main.py --task-name=oversampling --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]

The parameter "dataset" should be "default", "shoppers" or "weatherAUS".

For example:

python main.py --task-name=oversampling --dataset-name=default --device=0 --save-name=default_output

Minority Class Oversampling.

Run the code to generate synthetic data for missing tuple completion with the following command:

python main.py --task-name=completion --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]

The parameter "dataset" should be "heart", "airbnb" or "imdb".

Evaluation

Run the code to evaluate the results of the minority class oversampling/missing tuple completion with the following command:

python eval.py --task-name=[task name] --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]

To evaluate the performance of minority class oversampling on default dataset, assume the synthetic results are saved in default_output, the command is:

python eval.py --task-name=oversampling --dataset-name=default --device=0 --save-name=default_output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RelDDPM

Introduction

Quick Start

Environment Setup

Code Structure

Run

Minority Class Oversampling.

Minority Class Oversampling.

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
datasets		datasets
ddpm		ddpm
lib_completion		lib_completion
lib_oversampling		lib_oversampling
README.md		README.md
data_utils.py		data_utils.py
eval.py		eval.py
eval_utils.py		eval_utils.py
main.py		main.py
requirements.txt		requirements.txt

ruc-datalab/RelDDPM

Folders and files

Latest commit

History

Repository files navigation

RelDDPM

Introduction

Quick Start

Environment Setup

Code Structure

Run

Minority Class Oversampling.

Minority Class Oversampling.

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages