volcano-sh · yoursanonymous · Feb 15, 2026
diff --git a/.gitignore b/.gitignore
@@ -3,4 +3,7 @@ resources*
 public
 .vscode/*
 .DS_Store
-.hugo_build.lock
+.hugo_build.lock
+
+#Ignore vscode AI rules
+.github\instructions\codacy.instructions.md
diff --git a/content/en/docs/advanced-tutorials.md b/content/en/docs/advanced-tutorials.md
@@ -0,0 +1,36 @@
++++
+title = "Advanced Concepts Tutorial Series"
+linktitle = "Advanced Concepts"
+date = 2026-02-14
+publishdate = 2026-02-14
+lastmod = 2026-02-14
+draft = false
+toc = true
+type = "docs"
+
+# Add menu entry to sidebar.
+[menu.docs]
+  parent = "tutorial-series"
+  weight = 1
++++
+
+This section provides end-to-end guides for running production-grade batch workloads on Kubernetes using Volcano.
+
+## Why These Tutorials?
+
+While basic guides cover the syntax, these tutorials demonstrate how Volcano solves real-world engineering challenges:
+
+- **Background**: Understand the specific challenges (e.g., gang scheduling, resource starvation) addressed by the tutorial.
+- **Scenario**: A practical use case you might encounter in a production cluster.
+- **Step-by-Step Deployment**: Clear commands and complete, ready-to-use YAML manifests.
+- **Verification**: How to confirm your job is running and being scheduled correctly.
+
+## Tutorial Series
+
+- **[Distributed TensorFlow](/en/docs/tutorial-tensorflow/)**: Orchestrate high-performance ML training jobs with parameter servers and workers.
+- **[Apache Spark](/en/docs/tutorial-spark/)**: Prevent resource starvation in big data processing pipelines.
+- **[GPU Resource Management](/en/docs/tutorial-gpu-scheduling/)**: Maximize hardware efficiency through fractional sharing (vGPU) and isolation.
+- **[Multi-tenancy](/en/docs/tutorial-multi-tenancy/)**: Configure fair share scheduling and hierarchical queues for different teams.
+- **[Argo Workflows](/en/docs/tutorial-argo-workflows/)**: Integrate Volcano's advanced scheduling into your CI/CD and data pipelines.
+
+Back to basics? Check out our **[Quick Start](/en/docs/tutorials/)** 
diff --git a/content/en/docs/tutorial-argo-workflows.md b/content/en/docs/tutorial-argo-workflows.md
@@ -0,0 +1,134 @@
++++
+title = "Integrating with Argo Workflows"
+linktitle = "Argo Workflows"
+date = 2026-02-11
+publishdate = 2026-02-11
+lastmod = 2026-02-11
+draft = false
+toc = true
+type = "docs"
+
+[menu.docs]
+  parent = "tutorial-series"
+  weight = 50
++++
+
+This tutorial shows how to use Volcano as the scheduler for Argo Workflows to gain advanced batch scheduling features for your CI/CD and data processing pipelines.
+
+## Background
+
+Argo Workflows is a popular cloud-native workflow engine for orchestrating parallel jobs on Kubernetes. While Argo excels at managing dependencies and execution flow, it often relies on the default Kubernetes scheduler for individual steps.
+
+By integrating Volcano as the scheduler for Argo Workflows, you unlock advanced batch scheduling capabilities:
+
+- **Bin-packing**: Optimize resource utilization by packing tasks onto the fewest number of nodes.
+- **Fair Sharing**: Ensure that workflow steps across different tenants or namespaces are scheduled fairly according to configured weights.
+- **Gang Scheduling**: For workflows involving multiple parallel pods that must start together, Volcano ensures they are managed as a single unit (PodGroup).
+
+## Scenario
+
+A common workflow scenario involves a "Main" entrypoint that triggers one or more "Task" steps. In this tutorial, you will configure a simple Argo Workflow to use Volcano for its underlying pod scheduling.
+
+## Prerequisites
+
+Before you begin, ensure you have:
+- A Kubernetes cluster with Volcano installed.
+- [Argo Workflows](https://argoproj.github.io/argo-workflows/installation/) installed in your cluster.
+
+## Deployment Step-by-Step
+
+### 1. Create the Workflow Manifest
+
+You can configure Argo to use Volcano at the individual template level using the `schedulerName` field. Create a file named `volcano-workflow.yaml`:
+
+```yaml
+apiVersion: argoproj.io/v1alpha1
+kind: Workflow
+metadata:
+  generateName: volcano-workflow-
+spec:
+  entrypoint: main
+  templates:
+    - name: main
+      steps:
+        - - name: step1
+            template: whalesay
+    - name: whalesay
+      container:
+        image: docker/whalesay
+        command: [cowsay]
+        args: ["Hello from Argo + Volcano!"]
+      schedulerName: volcano # Explicitly tell Argo to use Volcano
+```
+
+### 2. Apply the Workflow
+
+Run the following command to submit your workflow:
+
+```bash
+argo submit volcano-workflow.yaml
+```
+
+## Advanced: Deploying VolcanoJobs from Argo
+
+For tasks that require native Volcano features like `minAvailable` or specific `plugins`, you can submit a `VolcanoJob` directly as a resource template:
+
+```yaml
+    - name: volcano-job-step
+      resource:
+        action: create
+        successCondition: status.state == Completed # Wait for the Job to finish
+        manifest: |
+          apiVersion: batch.volcano.sh/v1alpha1
+          kind: Job
+          metadata:
+            generateName: argo-step-
+          spec:
+            minAvailable: 1
+            schedulerName: volcano
+            tasks:
+              - name: task-1
+                replicas: 1
+                template:
+                  spec:
+                    containers:
+                    - name: main
+                      image: alpine
+                      command: ["echo", "running inside volcano job"]
+```
+
+## Verification
+
+### Check Workflow Status
+
+Monitor the progress of your workflow using the Argo CLI:
+
+```bash
+argo get @latest
+```
+
+### Verify the Scheduler
+
+Check the details of any pod created by the workflow to ensure it was handled by Volcano:
+
+```bash
+kubectl get pod <pod-name> -o jsonpath='{.spec.schedulerName}'
+```
+
+The output should be `volcano`.
+
+## Notes
+
+- **Global Configuration**: You can make Volcano the default scheduler for *all* Argo Workflows by updating the `workflow-controller-configmap` with `containerRuntimeExecutor: k8sapi` and setting the default scheduler name.
+- **ServiceAccount Permissions**: If using the `resource` template to create `VolcanoJobs`, ensure the ServiceAccount used by the Argo controller has RBAC permissions to `create`, `get`, and `watch` resources in the `batch.volcano.sh` group.
+- **PodGroups**: When a pod is scheduled by Volcano, a `PodGroup` is automatically created. You can inspect it with `kubectl get podgroups`.
+
+
+## Tutorial Series
+
+- **[Distributed TensorFlow](/en/docs/tutorial-tensorflow/)**: Orchestrate high-performance ML training jobs with parameter servers and workers.
+- **[Apache Spark](/en/docs/tutorial-spark/)**: Prevent resource starvation in big data processing pipelines.
+- **[GPU Resource Management](/en/docs/tutorial-gpu-scheduling/)**: Maximize hardware efficiency through fractional sharing (vGPU) and isolation.
+- **[Multi-tenancy](/en/docs/tutorial-multi-tenancy/)**: Configure fair share scheduling and hierarchical queues for different teams.
+
+Back to basics? Check out our **[Quick Start](/en/docs/tutorials/)** 
diff --git a/content/en/docs/tutorial-gpu-scheduling.md b/content/en/docs/tutorial-gpu-scheduling.md
@@ -0,0 +1,113 @@
++++
+title = "GPU Scheduling and Resource Management"
+linktitle = "GPU Scheduling"
+date = 2026-02-11
+publishdate = 2026-02-11
+lastmod = 2026-02-11
+draft = false
+toc = true
+type = "docs"
+
+[menu.docs]
+  parent = "tutorial-series"
+  weight = 30
++++
+
+This tutorial covers how to efficiently manage GPU resources using Volcano, including fractional GPU sharing (vGPU) and hardware-based isolation (MIG).
+
+## Background
+
+GPUs are high-performance but expensive resources. In standard Kubernetes, a physical GPU is typically treated as an indivisible unit—one GPU can only be assigned to one container. This often leads to significant underutilization, especially for smaller workloads like model inference or development tasks that don't require the full compute power or memory of a modern GPU.
+
+Volcano addresses this by providing robust **vGPU (virtual GPU) scheduling**. This allows you to:
+
+- **Fractional Sharing**: Slice a single physical GPU into multiple virtual GPUs (vGPUs).
+- **Resource Isolation**: Enforce specific compute (cores) and memory limits for each container sharing the physical hardware.
+- **Multiple Modes**: Support both software based slicing (via VCUDA) and hardware based isolation (via NVIDIA MIG).
+
+## Scenario
+
+Suppose you have a cluster where multiple users need to run lightweight inference tasks. Instead of dedicating one physical GPU to each user, you can partition each GPU to support multiple users simultaneously.
+
+In this tutorial, you will deploy a Volcano Job that requests a fractional share of a GPU: **20% of the compute power** and **2000MiB of memory**.
+
+## Prerequisites
+
+Before you begin, ensure you have:
+- A Kubernetes cluster with nodes equipped with NVIDIA GPUs.
+- The [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) installed on your nodes.
+- Volcano installed and the `volcano-vgpu-device-plugin` deployed.
+
+## Deployment Step-by-Step
+
+### 1. Create the GPU Sharing Manifest
+
+Create a file named `gpu-sharing-job.yaml` with the following content:
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+  name: gpu-sharing-tutorial
+spec:
+  minAvailable: 1
+  schedulerName: volcano
+  tasks:
+    - name: gpu-task
+      replicas: 1
+      template:
+        spec:
+          containers:
+            - name: gpu-container
+              image: nvidia/cuda:11.0-base
+              command: ["sh", "-c", "nvidia-smi && sleep 3600"]
+              resources:
+                limits:
+                  volcano.sh/vgpu-number: 1    # Request 1 virtual GPU
+                  volcano.sh/vgpu-memory: 2000 # Limit to 2000MiB of GPU memory
+                  volcano.sh/vgpu-cores: 20    # Limit to 20% of GPU compute
+          restartPolicy: Never
+```
+
+### 2. Apply the Manifest
+
+Run the following command to deploy the job:
+
+```bash
+kubectl apply -f gpu-sharing-job.yaml
+```
+
+## Verification
+
+### Check Resource Allocation
+
+Verify that your pod has been scheduled to a node with available vGPU resources:
+
+```bash
+kubectl get pods -l volcano.sh/job-name=gpu-sharing-tutorial
+```
+
+### Inspect the Container
+
+Check the logs to verify that the container correctly detects the GPU environment via `nvidia-smi`:
+
+```bash
+kubectl logs gpu-sharing-tutorial-completion-task-0
+```
+
+Even though it is a shared physical GPU, the `volcano-vgpu-device-plugin` ensures the container only utilizes the allocated memory and compute slices.
+
+## Notes
+
+- **Insufficient Resources**: If pods remain `Pending` with "insufficient volcano.sh/vgpu-number", check if your nodes are correctly labeled and the `volcano-vgpu-device-plugin` is healthy.
+- **Memory Limits**: If your application fails with Out of Memory (OOM) on the GPU, ensure the `vgpu-memory` limit is large enough for your specific model requirements.
+- **Hardware Isolation**: For mission critical workloads requiring strict hardware level isolation, consider using **Dynamic MIG** mode if your hardware supports it (e.g., A100/H100).
+
+## Tutorial Series
+
+- **[Distributed TensorFlow](/en/docs/tutorial-tensorflow/)**: Orchestrate high-performance ML training jobs with parameter servers and workers.
+- **[Apache Spark](/en/docs/tutorial-spark/)**: Prevent resource starvation in big data processing pipelines.
+- **[Multi-tenancy](/en/docs/tutorial-multi-tenancy/)**: Configure fair share scheduling and hierarchical queues for different teams.
+- **[Argo Workflows](/en/docs/tutorial-argo-workflows/)**: Integrate Volcano's advanced scheduling into your CI/CD and data pipelines.
+
+Back to basics? Check out our **[Quick Start](/en/docs/tutorials/)**