Skip to content

docs: add tutorial series#487

Open
yoursanonymous wants to merge 1 commit intovolcano-sh:masterfrom
yoursanonymous:working_type
Open

docs: add tutorial series#487
yoursanonymous wants to merge 1 commit intovolcano-sh:masterfrom
yoursanonymous:working_type

Conversation

@yoursanonymous
Copy link

  • Please check if the PR fulfills these requirements
  • The commit message follows our guidelines
  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

/kind documentation

  • What this PR does / why we need it:

This pull request introduces a new, comprehensive tutorial series focused on real-world production scenarios. It addresses the gap between basic examples and the complex end-to-end configurations required for production environments.

  • Which issue(s) this PR fixes:

@volcano-sh-bot volcano-sh-bot added the kind/documentation Categorizes issue or PR as related to documentation. label Feb 15, 2026
@volcano-sh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign thor-wl for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link

Summary of Changes

Hello @yoursanonymous, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the documentation by introducing a comprehensive 'Advanced Concepts Tutorial Series.' This new series provides in-depth, end-to-end guides for running production-grade batch workloads on Kubernetes using Volcano, covering complex scenarios like distributed machine learning, big data processing, GPU resource management, multi-tenancy, and integration with workflow engines like Argo Workflows. The aim is to bridge the gap between basic examples and real-world production configurations, offering practical solutions to common engineering challenges.

Highlights

  • New Advanced Tutorial Series: A new 'Advanced Concepts Tutorial Series' overview page has been added, serving as the entry point for comprehensive guides on production-grade batch workloads.
  • Argo Workflows Integration Tutorial: A new tutorial demonstrates how to integrate Volcano as the scheduler for Argo Workflows, enabling advanced batch scheduling features for CI/CD and data pipelines.
  • GPU Scheduling and Resource Management Tutorial: A dedicated tutorial has been added for efficient GPU scheduling and resource management using Volcano, covering fractional GPU sharing (vGPU) and hardware-based isolation (MIG).
  • Multi-tenancy Queue Configuration Tutorial: A new tutorial explains how to set up multi-tenant environments using Volcano's hierarchical queues for fair resource sharing and dynamic allocation.
  • Apache Spark Batch Processing Tutorial: A tutorial has been introduced to guide users on orchestrating Apache Spark applications with Volcano for optimized resource utilization and gang scheduling to prevent starvation.
  • Distributed TensorFlow Training Tutorial: A new tutorial provides instructions for running distributed TensorFlow training jobs on Volcano, leveraging gang scheduling, service discovery, and resilient lifecycle management.
  • Documentation Navigation Update: The existing basic tutorials page has been updated to include a prominent link to the newly introduced 'Advanced Concepts Tutorial Series'.
  • Git Ignore Update: The .gitignore file was updated to include a new entry for ignoring VS Code AI rules.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .gitignore
    • Added a new entry to ignore a VS Code AI rules file.
  • content/en/docs/advanced-tutorials.md
    • Created a new documentation page serving as the entry point for the advanced tutorial series.
  • content/en/docs/tutorial-argo-workflows.md
    • Added a new tutorial explaining how to integrate Volcano as the scheduler for Argo Workflows.
  • content/en/docs/tutorial-gpu-scheduling.md
    • Added a new tutorial on managing GPU resources with Volcano, covering vGPU and MIG.
  • content/en/docs/tutorial-multi-tenancy.md
    • Added a new tutorial demonstrating multi-tenant queue configuration with Volcano.
  • content/en/docs/tutorial-spark.md
    • Added a new tutorial for optimizing Apache Spark batch processing using Volcano.
  • content/en/docs/tutorial-tensorflow.md
    • Added a new tutorial on running distributed TensorFlow training jobs with Volcano.
  • content/en/docs/tutorials.md
    • Updated the existing basic tutorials page to include a link to the new advanced tutorial series.
Activity
  • No human activity (comments, reviews, etc.) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@volcano-sh-bot volcano-sh-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 15, 2026
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive tutorial series for advanced Volcano concepts, which is a great addition to the documentation. The tutorials cover important real-world scenarios like distributed training, big data processing, and multi-tenancy. My review focuses on ensuring the correctness of the examples and maintaining documentation quality. I've found a few issues in the example code snippets that would prevent them from running correctly, such as an invalid TF_CONFIG generation in the TensorFlow tutorial and an incorrect successCondition in the Argo Workflows tutorial. I've also pointed out some minor inconsistencies, like incorrect pod names in commands and the use of backslashes in .gitignore. Additionally, several new markdown files are missing a final newline character. Addressing these points will improve the quality and usability of the new tutorials.

I am having trouble creating individual review comments. Click here to see my feedback.

content/en/docs/tutorial-argo-workflows.md (80)

high

The successCondition for a Volcano Job appears to be incorrect. The job status is typically found in status.state.phase. The expression should check this field to correctly determine if the job has completed.

        successCondition: status.state.phase == 'Completed' # Wait for the Job to finish

content/en/docs/tutorial-gpu-scheduling.md (95)

high

The pod name in the example kubectl logs command seems incorrect. Based on the job manifest, the task name is gpu-task, so the pod name should be gpu-sharing-tutorial-gpu-task-0, not gpu-sharing-tutorial-completion-task-0.

kubectl logs gpu-sharing-tutorial-gpu-task-0

content/en/docs/tutorial-tensorflow.md (71-74)

high

The generated TF_CONFIG JSON is invalid because the host strings in the ps and worker arrays are not quoted. This will cause the TensorFlow application to fail when parsing the configuration. The shell script should be updated to wrap each host:port combination in double quotes.

                  PS_HOSTS=$(cat /etc/volcano/ps.host | sed 's/.*/"&:2222"/' | tr '\n' ',' | sed 's/,$//');
                  WORKER_HOSTS=$(cat /etc/volcano/worker.host | sed 's/.*/"&:2222"/' | tr '\n' ',' | sed 's/,$//');
                  export TF_CONFIG="{\"cluster\":{\"ps\":[${PS_HOSTS}],\"worker\":[${WORKER_HOSTS}]},\"task\":{\"type\":\"ps\",\"index\":${VK_TASK_INDEX}},\"environment\":\"cloud\"}";
                  python /var/tf_dist_mnist/dist_mnist.py

content/en/docs/tutorial-tensorflow.md (93-96)

high

The generated TF_CONFIG JSON is invalid because the host strings in the ps and worker arrays are not quoted. This will cause the TensorFlow application to fail when parsing the configuration. The shell script should be updated to wrap each host:port combination in double quotes.

                  PS_HOSTS=$(cat /etc/volcano/ps.host | sed 's/.*/"&:2222"/' | tr '\n' ',' | sed 's/,$//');
                  WORKER_HOSTS=$(cat /etc/volcano/worker.host | sed 's/.*/"&:2222"/' | tr '\n' ',' | sed 's/,$//');
                  export TF_CONFIG="{\"cluster\":{\"ps\":[${PS_HOSTS}],\"worker\":[${WORKER_HOSTS}]},\"task\":{\"type\":\"worker\",\"index\":${VK_TASK_INDEX}},\"environment\":\"cloud\"}";
                  python /var/tf_dist_mnist/dist_mnist.py

.gitignore (9)

medium

For cross-platform compatibility in .gitignore files, it's best practice to use forward slashes (/) as path separators instead of backslashes (\).

.github/instructions/codacy.instructions.md

content/en/docs/advanced-tutorials.md (36)

medium

The file is missing a final newline. It's a good practice to end files with a newline character.

content/en/docs/tutorial-argo-workflows.md (134)

medium

The file is missing a final newline. It's a good practice to end files with a newline character.

content/en/docs/tutorial-gpu-scheduling.md (113)

medium

The file is missing a final newline. It's a good practice to end files with a newline character.

content/en/docs/tutorial-multi-tenancy.md (145)

medium

The file is missing a final newline. It's a good practice to end files with a newline character.

content/en/docs/tutorial-spark.md (122)

medium

The file is missing a final newline. It's a good practice to end files with a newline character.

content/en/docs/tutorial-tensorflow.md (152)

medium

The file is missing a final newline. It's a good practice to end files with a newline character.

content/en/docs/tutorials.md (369)

medium

The file is missing a final newline. It's a good practice to end files with a newline character.

Signed-off-by: vinayak sharma <vinayaks0111@gmail.com>
@yoursanonymous
Copy link
Author

@JesseStutler ,
Thanks for the feedback.

I have fixed the CI issues and validated the manifests . I also executed the Quickstart tutorial on a live cluster, and it worked as expected end-to-end.

Please let me know if any additional checks are needed.

@yoursanonymous yoursanonymous mentioned this pull request Feb 16, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/documentation Categorizes issue or PR as related to documentation. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments