Skip to content

About Training #30

@Forainest789

Description

@Forainest789

Thanks for your work.
I have some questions related to training.
I tried to train the model with a small portion of the data, but when I tried to train using dataset online like:
https://huggingface.co/datasets/imageomics/TreeOfLife-10M/blob/main/dataset/EOL/image_set_01.tar.gz,
and download the dataset in local

python -m src.training.main \
  --train-data 'https://huggingface.co/datasets/imageomics/TreeOfLife-10M/resolve/main/dataset/EOL/image_set_01.tar.gz' \
  --val-data 'https://huggingface.co/datasets/imageomics/TreeOfLife-10M/resolve/main/dataset/EOL/image_set_01.tar.gz' \
  --dataset-type 'webdataset' \
  --pretrained 'openai' \
  --text_type 'random' \
  --warmup 100 \
  --batch-size 1 \
  --accum-freq 1 \
  --epochs 10 \
  --workers 1 \
  --model ViT-B-16 \
  --lr 1e-4 \
  --log-every-n-steps 1 \
  --dataset-resampled \
  --local-loss \
  --gather-with-grad \
  --grad-checkpointing \
  --logs '../storage/log/' \
  --train-num-samples 98000 \

it always gets stuck at the following position

2024-12-11,23:16:02 | INFO | wandb_notes:
2024-12-11,23:16:02 | INFO | wandb_project_name: open-clip
2024-12-11,23:16:02 | INFO | warmup: 100
2024-12-11,23:16:02 | INFO | wd: 0.2
2024-12-11,23:16:02 | INFO | workers: 1 
2024-12-11,23:16:02 | INFO | world_size: 1 
2024-12-11,23:16:02 | INFO | zeroshot_frequency: 2 
2024-12-11,23:16:02 | INFO | Finish counting shard total size: 98000. 
2024-12-11,23:16:02 | INFO | Finish counting shard total size: 0. 
2024-12-11,23:16:02 | INFO | Start epoch 0 
<webdataset.compat.WebLoader object at 0x719706e3a170>

In addition, I found the missing "data/resolved.jsonl" file when creating the data,

python scripts/evobio10m/make_metadata.py --db /fs/ess/PAS2136/open_clip/data/evobio10m-v3.3/mapping.sqlite

and the ToL-EDA HF Repo mentioned in the readme has disappeared

Can you provide me with some help to solve these problems
Or where can I find the details about training

Thank you very much

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions