Avoid hardcoded folder paths in launch scripts by fzyzcjy · Pull Request #600 · radixark/miles

fzyzcjy · 2026-02-14T10:44:08Z

splitted from a co-worked branch

Add data_dir, model_dir, model_local_dir, save_dir, megatron_path as configurable dataclass fields with backward-compatible defaults, replacing all hardcoded /root/models, /root/datasets, /root/local_data, /root/shared_data, /root/Megatron-LM references across 5 scripts and command_utils.py.

Co-authored-by: Yueming Yuan <yym022502@gmail.com>

gemini-code-assist · 2026-02-14T10:44:31Z

Summary of Changes

Hello @fzyzcjy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the configurability and maintainability of several core Python scripts by externalizing previously hardcoded file system paths. By introducing dedicated fields within the ScriptArgs dataclass and updating relevant utility functions, users can now easily customize critical directory locations for models, datasets, and other resources without modifying the script code directly. This change promotes a more flexible and robust environment for development and deployment, while ensuring that existing workflows remain unaffected through careful use of default values.

Highlights

Path Externalization: Hardcoded /root/ paths for models, datasets, local data, shared data, and Megatron-LM in five Python scripts (run_deepseek.py, run_qwen3_4b.py, run_qwen3_30b_a3b.py, run_glm45_355b_a32b.py, run_mcore_fsdp.py) have been extracted into ScriptArgs dataclass fields.
Utility Function Updates: The command_utils.py module's hf_download_dataset, convert_checkpoint, and execute_train functions were updated to accept these new path parameters, enhancing their flexibility.
Backward Compatibility: All newly introduced ScriptArgs fields retain their original hardcoded values as default, ensuring full backward compatibility for existing configurations.
PYTHONPATH Normalization: The handling of PYTHONPATH has been normalized to consistently use paths without trailing slashes, resolving an inconsistency between execute_train and convert_checkpoint.

Changelog

miles/utils/external_utils/command_utils.py
- Added megatron_path parameter to convert_checkpoint and execute_train functions, and updated PYTHONPATH usage.
- Added data_dir parameter to hf_download_dataset function and updated the local download directory.
scripts/run_deepseek.py
- Introduced data_dir, model_dir, model_local_dir, save_dir, and megatron_path fields to ScriptArgs.
- Updated prepare_single, fp8_cast_bf16, _prepare_cp, and train functions to utilize the new configurable path arguments.
scripts/run_glm45_355b_a32b.py
- Added data_dir, model_dir, model_local_dir, save_dir, and megatron_path fields to ScriptArgs.
- Modified prepare_single, _convert_hf_to_fp8, _prepare_cp, and train functions to use the new configurable path arguments.
scripts/run_mcore_fsdp.py
- Added data_dir, model_dir, save_dir, and megatron_path fields to ScriptArgs.
- Updated prepare and execute functions to dynamically use the new configurable path arguments for various operations.
scripts/run_qwen3_30b_a3b.py
- Introduced data_dir, model_dir, save_dir, and megatron_path fields to ScriptArgs.
- Modified prepare and execute functions to utilize the new configurable path arguments for file and directory interactions.
scripts/run_qwen3_4b.py
- Added data_dir, model_dir, save_dir, and megatron_path fields to ScriptArgs.
- Updated prepare and execute functions to use the new configurable path arguments for managing model, dataset, and checkpoint paths.

Activity

The pull request author, fzyzcjy, initiated this change to improve script configurability.
Hardcoded /root/ paths were identified and extracted into ScriptArgs fields.
Utility functions in command_utils.py were adapted to accept these new path parameters.
All affected scripts were updated to use the new configurable path variables, maintaining default behavior.
An inconsistency in PYTHONPATH usage regarding trailing slashes was resolved.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR aims to extract hardcoded /root/ paths into ScriptArgs fields for improved configurability, achieving this while maintaining backward compatibility. However, this introduces a critical command injection vulnerability. User-controllable inputs from ScriptArgs are used in shell commands executed via bash -c without proper sanitization (e.g., using shlex.quote()), making the application vulnerable. Additionally, consider using pathlib.Path for path concatenation instead of f-strings (e.g., Path(data_dir) / partial_name). This improves robustness, cross-platform compatibility, and maintainability, and a specific suggestion is provided in miles/utils/external_utils/command_utils.py.

miles/utils/external_utils/command_utils.py

scripts/run_deepseek.py

scripts/run_mcore_fsdp.py

scripts/run_qwen3_30b_a3b.py

scripts/run_qwen3_4b.py

- Rename save_dir -> output_dir for broader semantics (checkpoints, dump details, core dumps) - Move output_dir field from individual ScriptArgs to ExecuteTrainConfig so all scripts inherit it - Replace hardcoded /root/shared_data in CUDA_COREDUMP_FILE with config.output_dir

fzyzcjy and others added 2 commits February 14, 2026 18:37

Add co-author for path extraction refactor

213ab0d

Co-authored-by: Yueming Yuan <yym022502@gmail.com>

fzyzcjy requested review from guapisolo, maocheng23 and yueming-yuan as code owners February 14, 2026 10:44

fzyzcjy changed the title ~~Extract hardcoded /root/ paths into ScriptArgs fields~~ Extract hardcoded folder paths into ScriptArgs fields Feb 14, 2026

gemini-code-assist bot reviewed Feb 14, 2026

View reviewed changes

miles/utils/external_utils/command_utils.py Show resolved Hide resolved

scripts/run_deepseek.py Show resolved Hide resolved

scripts/run_mcore_fsdp.py Show resolved Hide resolved

scripts/run_qwen3_30b_a3b.py Show resolved Hide resolved

scripts/run_qwen3_4b.py Show resolved Hide resolved

fzyzcjy added 2 commits February 14, 2026 18:47

Fix black formatting in run_glm45_355b_a32b.py

37b51d7

fzyzcjy changed the title ~~Extract hardcoded folder paths into ScriptArgs fields~~ Avoid hardcoded folder paths in launch scripts Feb 14, 2026

fzyzcjy force-pushed the feat/generalize_path branch from a56ec11 to 6a97efe Compare February 15, 2026 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid hardcoded folder paths in launch scripts#600

Avoid hardcoded folder paths in launch scripts#600
fzyzcjy wants to merge 4 commits intomainfrom
feat/generalize_path

fzyzcjy commented Feb 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fzyzcjy commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fzyzcjy commented Feb 14, 2026 •

edited

Loading