Skip to content

[Bug]: ImportError: Dependency conflict between modelscope and datasets (v3.0+) during model.safetensors #646

@rururush

Description

@rururush

Prerequisites

  • I have searched the existing issues and confirmed this is not a duplicate.
  • I am using the latest version of the MLLM framework.

Bug Description

a series of ImportError occurs due to a breaking change in the datasets library (version 3.0 and above). The modelscope library (v1.34.0) attempts to import internal variables that have been removed or relocated in newer versions of datasets.
Run the training command:
python train.py --model_path ~/path/to/Qwen1.5-0.5B --max_length 1024 --num_samples 128 --output_dir ~/output/
Error Message is like:
ImportError: cannot import name 'HubDatasetModuleFactoryWithoutScript' from 'datasets.load'(xxx/python3.10/site-packages/datasets/load.py)

  1. If datasets Version:4.6.1 Missing ALL_ALLOWED_EXTENSIONS:

ImportError: cannot import name 'ALL_ALLOWED_EXTENSIONS' from 'datasets.load'

  1. If datasets Version: 2.18.0, Missing LargeList:

ImportError: cannot import name 'LargeList' from 'datasets'

  1. Missing _FEATURE_TYPES:

ImportError: cannot import name '_FEATURE_TYPES' from 'datasets.features.features'

Steps to Reproduce

Run the training command:
python train.py --model_path ~/path/to/Qwen1.5-0.5B --max_length 1024 --num_samples 128 --output_dir ~/output/

Expected Behavior

export my module as a .safetensors file successfully.

Operating System

Linux

Device

Computer

MLLM Framework Version

V2.0.0

Model Information

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions