docs: add tutorial series by yoursanonymous · Pull Request #487 · volcano-sh/website

yoursanonymous · 2026-02-15T13:03:01Z

Please check if the PR fulfills these requirements

The commit message follows our guidelines

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

/kind documentation

What this PR does / why we need it:

This pull request introduces a new, comprehensive tutorial series focused on real-world production scenarios. It addresses the gap between basic examples and the complex end-to-end configurations required for production environments.

Which issue(s) this PR fixes:

volcano-sh-bot · 2026-02-15T13:03:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign thor-wl for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2026-02-15T13:03:18Z

Summary of Changes

Hello @yoursanonymous, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the documentation by introducing a comprehensive 'Advanced Concepts Tutorial Series.' This new series provides in-depth, end-to-end guides for running production-grade batch workloads on Kubernetes using Volcano, covering complex scenarios like distributed machine learning, big data processing, GPU resource management, multi-tenancy, and integration with workflow engines like Argo Workflows. The aim is to bridge the gap between basic examples and real-world production configurations, offering practical solutions to common engineering challenges.

Highlights

New Advanced Tutorial Series: A new 'Advanced Concepts Tutorial Series' overview page has been added, serving as the entry point for comprehensive guides on production-grade batch workloads.
Argo Workflows Integration Tutorial: A new tutorial demonstrates how to integrate Volcano as the scheduler for Argo Workflows, enabling advanced batch scheduling features for CI/CD and data pipelines.
GPU Scheduling and Resource Management Tutorial: A dedicated tutorial has been added for efficient GPU scheduling and resource management using Volcano, covering fractional GPU sharing (vGPU) and hardware-based isolation (MIG).
Multi-tenancy Queue Configuration Tutorial: A new tutorial explains how to set up multi-tenant environments using Volcano's hierarchical queues for fair resource sharing and dynamic allocation.
Apache Spark Batch Processing Tutorial: A tutorial has been introduced to guide users on orchestrating Apache Spark applications with Volcano for optimized resource utilization and gang scheduling to prevent starvation.
Distributed TensorFlow Training Tutorial: A new tutorial provides instructions for running distributed TensorFlow training jobs on Volcano, leveraging gang scheduling, service discovery, and resilient lifecycle management.
Documentation Navigation Update: The existing basic tutorials page has been updated to include a prominent link to the newly introduced 'Advanced Concepts Tutorial Series'.
Git Ignore Update: The .gitignore file was updated to include a new entry for ignoring VS Code AI rules.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

.gitignore
- Added a new entry to ignore a VS Code AI rules file.
content/en/docs/advanced-tutorials.md
- Created a new documentation page serving as the entry point for the advanced tutorial series.
content/en/docs/tutorial-argo-workflows.md
- Added a new tutorial explaining how to integrate Volcano as the scheduler for Argo Workflows.
content/en/docs/tutorial-gpu-scheduling.md
- Added a new tutorial on managing GPU resources with Volcano, covering vGPU and MIG.
content/en/docs/tutorial-multi-tenancy.md
- Added a new tutorial demonstrating multi-tenant queue configuration with Volcano.
content/en/docs/tutorial-spark.md
- Added a new tutorial for optimizing Apache Spark batch processing using Volcano.
content/en/docs/tutorial-tensorflow.md
- Added a new tutorial on running distributed TensorFlow training jobs with Volcano.
content/en/docs/tutorials.md
- Updated the existing basic tutorials page to include a link to the new advanced tutorial series.

Activity

No human activity (comments, reviews, etc.) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds a comprehensive tutorial series for advanced Volcano concepts, which is a great addition to the documentation. The tutorials cover important real-world scenarios like distributed training, big data processing, and multi-tenancy. My review focuses on ensuring the correctness of the examples and maintaining documentation quality. I've found a few issues in the example code snippets that would prevent them from running correctly, such as an invalid TF_CONFIG generation in the TensorFlow tutorial and an incorrect successCondition in the Argo Workflows tutorial. I've also pointed out some minor inconsistencies, like incorrect pod names in commands and the use of backslashes in .gitignore. Additionally, several new markdown files are missing a final newline character. Addressing these points will improve the quality and usability of the new tutorials.

I am having trouble creating individual review comments. Click here to see my feedback.

content/en/docs/tutorial-argo-workflows.md (80)

The successCondition for a Volcano Job appears to be incorrect. The job status is typically found in status.state.phase. The expression should check this field to correctly determine if the job has completed.

        successCondition: status.state.phase == 'Completed' # Wait for the Job to finish

content/en/docs/tutorial-gpu-scheduling.md (95)

The pod name in the example kubectl logs command seems incorrect. Based on the job manifest, the task name is gpu-task, so the pod name should be gpu-sharing-tutorial-gpu-task-0, not gpu-sharing-tutorial-completion-task-0.

kubectl logs gpu-sharing-tutorial-gpu-task-0

content/en/docs/tutorial-tensorflow.md (71-74)

The generated TF_CONFIG JSON is invalid because the host strings in the ps and worker arrays are not quoted. This will cause the TensorFlow application to fail when parsing the configuration. The shell script should be updated to wrap each host:port combination in double quotes.

                  PS_HOSTS=$(cat /etc/volcano/ps.host | sed 's/.*/"&:2222"/' | tr '\n' ',' | sed 's/,$//');
                  WORKER_HOSTS=$(cat /etc/volcano/worker.host | sed 's/.*/"&:2222"/' | tr '\n' ',' | sed 's/,$//');
                  export TF_CONFIG="{\"cluster\":{\"ps\":[${PS_HOSTS}],\"worker\":[${WORKER_HOSTS}]},\"task\":{\"type\":\"ps\",\"index\":${VK_TASK_INDEX}},\"environment\":\"cloud\"}";
                  python /var/tf_dist_mnist/dist_mnist.py

content/en/docs/tutorial-tensorflow.md (93-96)

The generated TF_CONFIG JSON is invalid because the host strings in the ps and worker arrays are not quoted. This will cause the TensorFlow application to fail when parsing the configuration. The shell script should be updated to wrap each host:port combination in double quotes.

                  PS_HOSTS=$(cat /etc/volcano/ps.host | sed 's/.*/"&:2222"/' | tr '\n' ',' | sed 's/,$//');
                  WORKER_HOSTS=$(cat /etc/volcano/worker.host | sed 's/.*/"&:2222"/' | tr '\n' ',' | sed 's/,$//');
                  export TF_CONFIG="{\"cluster\":{\"ps\":[${PS_HOSTS}],\"worker\":[${WORKER_HOSTS}]},\"task\":{\"type\":\"worker\",\"index\":${VK_TASK_INDEX}},\"environment\":\"cloud\"}";
                  python /var/tf_dist_mnist/dist_mnist.py

.gitignore (9)

For cross-platform compatibility in .gitignore files, it's best practice to use forward slashes (/) as path separators instead of backslashes (\).

.github/instructions/codacy.instructions.md