Cross-domain Distillation for Unsupervised Domain Adaptation with Large Vision-language Models

Highlights

Abstract: Large vision-language models (VLMs), incorporating the prompt learning mechanism, have achieved promising results in cross-domain tasks. However, leveraging VLMs to transfer the knowledge from the source domain to the target domain remains a challenging task for unsupervised domain adaptation (UDA). To this end, we propose \textbf{\underline{C}}ross-domain \textbf{\underline{D}}istillation for \textbf{\underline{U}}DA with LVMs (termed as CDU). Firstly, CDU trains a source model by embedding the knowledge of the source domain (including both each sample and its corresponding class category) into VLMs in a lightweight manner. Secondly, CDU makes full use of the image and text semantics from the source model to guide the target model learning, thereby achieving domain alignment to yield semantically consistent representations across domains. We conduct extensive experiments on 3 popular UDA datasets including Office-31, Office-Home, and Mini-DomainNet. Experimental results verify our method consistently surpasses the state-of-the-art (SOTA) UDA methods by a large margin with higher performance and lower model complexity on various UDA benchmarks. Take Office-Home as an example, the average accuracy of CDU exceeds existing methods by at least 3%, yet the number of learnable parameters only accounts for 17.9% and the inference time only takes up 4.3% compared to those of others. The code of this paper is available at GitHub: \textcolor{blue}{https://github.com/1d1x1w/CDU}.

Main Contributions

New perspective. To the best of our knowledge, this is the first attempt that leverages both the visual and textual semantic information of VLMs to transfer knowledge from the source domain to the target domain for UDA.
Novel method： We introduce a novel UDA approach CDU to implement lightweight cross-domain distillation that makes full use of both the image and text semantics of the source domain, generated by VLMs, to simultaneously guide the image features generation and text label prediction for the target domain.
High Performance： We conduct extensive experiments on 3 popular UDA datasets including Office-31, Office-Home and Mini-DomainNet. The experimental results validate the superiority of our CDU which achieves higher performance with lower model complexity compared with the state-of-the-art (SOTA) UDA methods on various cross-domain tasks.

Results

PMCC in comparison with existing prompt tuning methods

Results reported below show accuracy across 4 UDA datasets with ViT-B/16 backbone. Our PMCC method adopts the paradigm of multi-modal prompt tuning.

Name	Office-Home Acc.	Office-31 Acc.
CLIP	82.1	77.5
CoOp	83.9	89.4
CoCoOp	84.1	88.9
VPT-deep	83.9	89.4
MaPLe	84.2	89.6
DAPL	84.4	81.2
PDA	85.7	91.2
CDU(Ours)	90.1	94.0

Name	Mini-DomainNet Acc.
DeiT	55.1
ViT	57.5
CLIP	69.3
SSRT	65.4
CDTrans	63.2
DAPL	73.6
PMTrans	69.6
PADCLIP	74.7
UniMoS	76.0
CDU(Ours)	78.0

Installation

For installation and other package requirements, please follow the instructions as follows. This codebase is tested on Ubuntu 22.04 LTS with python 3.7. Follow the below steps to create environment and install dependencies.

Setup conda environment.

# Create a conda environment
conda create -y -n cdu python=3.7

# Activate the environment
conda activate cdu

# Install torch (requires version >= 1.8.1) and torchvision
# Please refer to https://pytorch.org/get-started/previous-versions/ if your cuda version is different
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch

Install dassl library.

# Instructions borrowed from https://github.com/KaiyangZhou/Dassl.pytorch#installation

# Clone this repo
git clone https://github.com/KaiyangZhou/Dassl.pytorch.git
cd Dassl.pytorch

# Install dependencies
pip install -r requirements.txt

# Install this library (no need to re-build if the source code is modified)
python setup.py develop
cd ..

Clone PMCC code repository and install requirements.

# Clone CDU code base
git clone https://github.com/246dxw/CDU.git
cd CDU

# Install requirements
pip install -r requirements.txt

Data preparation

Please follow the instructions as follows to prepare all datasets. Datasets list:

Training and Evaluation

Please follow the instructions for training, evaluating and reproducing the results. Firstly, you need to modify the directory of data by yourself.

Source Model Training

# Example: trains on Office-Home dataset, and the source domian is art and the target domain is clipart (a-c)
bash scripts/cdu/main_cdusource.sh officehome b32_ep20_officehome CDUSOURCE ViT-L/14 4 a-c 0

Target Model Training

# Example: trains on Office-Home dataset, and the source domian is art and the target domain is clipart (a-c)
bash scripts/cdu/main_cdutarget.sh officehome b32_ep20_officehome CDUTARGET ViT-B/16 4 a-c 0

Evaluation

# evaluates on Office-Home dataset, and the source domian is art and the target domain is clipart (a-c)
bash scripts/cdu/eval_cdutarget.sh officehome b32_ep20_officehome CDUTARGET ViT-B/16 4 a-c 0

The details are at each method folder in [scripts folder](CDU/scripts at main · 246dxw/CDU(github.com)).

Acknowledgements

Our style of reademe refers to PDA. And our code is based on CoOp and CoCoOp, DAPL ，MaPLe and PDA etc. repository. We thank the authors for releasing their code. If you use their model and code, please consider citing these works as well. Supported methods are as follows:

Method	Paper	Code
CoOp	IJCV 2022	link
CoCoOp	CVPR 2022	link
VPT	ECCV 2022	link
IVLP & MaPLe	CVPR 2023	link
DAPL	TNNLS 2023	link
PDA	AAAI 2024	link

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.idea		.idea
clip		clip
configs		configs
dassl		dassl
datasets		datasets
scripts		scripts
trainers		trainers
tsne		tsne
utils		utils
.gitignore		.gitignore
Architecture.png		Architecture.png
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-domain Distillation for Unsupervised Domain Adaptation with Large Vision-language Models

Highlights

Main Contributions

Results

PMCC in comparison with existing prompt tuning methods

Installation

Data preparation

Training and Evaluation

Source Model Training

Target Model Training

Evaluation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

1d1x1w/CDU

Folders and files

Latest commit

History

Repository files navigation

Cross-domain Distillation for Unsupervised Domain Adaptation with Large Vision-language Models

Highlights

Main Contributions

Results

PMCC in comparison with existing prompt tuning methods

Installation

Data preparation

Training and Evaluation

Source Model Training

Target Model Training

Evaluation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages