PREVISION

Prevision is a project that aims to enhance perception in autonomous driving. It involves fine-tuning LLaVA with LoRA and integrating YOLO and the Depth Anything model to improve the object detection and overall image QA accuracy.

How to run your code?

Setup

Clone the repository

git clone https://github.com/DLCV-Fall-2024/DLCV-Fall-2024-Final-1-cvpr2025.git
cd DLCV-Fall-2024-Final-1-cvpr2025

Create a new conda environment

conda create -n final python=3.10.16 --y
conda activate final

# Install the required packages
cd LLaVA
pip install -e .
pip insatll -e ".[train]"
pip install flash-attn
cd ..

Data generation

Download yolo weights. Please refer to the README.md in the weights folder.
Create another environment for data generation using python 3.11

conda create -n gen_data python=3.11 --y
conda activate gen_data

pip install -r generate_pretrain_data/requirements.txt

Generate the annotation file for training and testing

bash ./gen_annotation.sh <Path to processed data folder> <split>

# Example
# bash ./gen_annotation.sh ./training train
# bash ./gen_annotation.sh ./testing test

first argument - path to the processed data folder
second argument - should be either train, val, or test.

Training

# At the root of the repository (DLCV-Fall-2024-Final-1-cvpr2025)

bash ./train.sh <training_annotation_path> <validation_annotation_path> <model_checkpoint_path> <pretrain_bbox_encoder_path>

# Example
# bash ./train.sh ./training/train.json ./validation/val.json ./model_checkpoint ./pretrain_bbox_encoder

first argument - path to the training annotation file
second argument - path to the validation annotation file (currently deprecrated for llava)
third argument - path to the model checkpoint folder (will be created if not exists)
fourth argument - path to the pretrain bbox encoder folder (if pretraining pretrained bbox encoder model exists)

This will generate the model checkpoints in the model_checkpoint folder with 6000 steps currently.

Inferencing

# At the root of the repository (DLCV-Fall-2024-Final-1-cvpr2025)
bash ./inference.sh <model_checkpoint_path> <testing_annotation> <test_images> <output_json>

# Example
# bash ./inference.sh ./model_checkpoint ./testing/test.json ./testing/test_images ./submission.json

first argument - path to the model checkpoint folder
second argument - path to the testing annotation file
third argument - path to the testing images folder
fourth argument - path to the output json file

This will generate the output json file in the output folder in the fourth argument.

Pretraining

Download the dataset from https://www.nuscenes.org/nuimages (our main dataset used for pretraining)
move all the images to a folder named nuImages.

mv <donwloaded_folder> ./nuImages

# If there are multiple folders, move all the images to the same folder

generate pretrained annotation data

# At the root of the repository (DLCV-Fall-2024-Final-1-cvpr2025)
bash ./gen_pretrain_annotation.sh <path_to_nuImages> <output_folder>

# Example
# bash ./gen_pretrain_annotation.sh ./nuImages ./pretrain_data

first argument - path to the nuImages folder
second argument - path to the output folder

This will create a file named pretrain_data.json in the output folder.

Pretrain the model

# At the root of the repository (DLCV-Fall-2024-Final-1-cvpr2025)
bash ./pretrain.sh <training_annotation_path>  <output_checkpoint_path>

# Example
# bash ./pretrain.sh ./pretrain_data/pretrain_data.json ./pretrain_checkpoint

first argument - path to the pretrain annotation file
second argument - path to the output checkpoint folder

This will generate the pretrained model checkpoints in the pretrain_checkpoint folder with 2000 steps currently.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
LLaVA		LLaVA
generate_pretrain_data		generate_pretrain_data
results		results
utils		utils
.gitignore		.gitignore
README.md		README.md
inference.sh		inference.sh
parallel.py		parallel.py
parallel_inference.py		parallel_inference.py
parallel_inference.sh		parallel_inference.sh
pretrain.sh		pretrain.sh
train.sh		train.sh
yolo_depth.py		yolo_depth.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PREVISION

How to run your code?

Setup

Data generation

Training

Inferencing

Pretraining

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

walkerhsu/PreVision

Folders and files

Latest commit

History

Repository files navigation

PREVISION

How to run your code?

Setup

Data generation

Training

Inferencing

Pretraining

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages