RedHatQuickCourses · kknoxrht · Apr 13, 2026 · Apr 13, 2026
diff --git a/modules/ch2-mig/nav.adoc b/modules/ch2-mig/nav.adoc
@@ -3,5 +3,6 @@
 *** xref:s1-mig-overview-2.adoc[]
 *** xref:s1-mig-overview-3.adoc[]
 *** xref:s1-mig-overview-4.adoc[]
-*** xref:s1-mig-overview.adoc[]
+*** xref:s1-mig-overview-5.adoc[]
 *** xref:s2-mig-slicing-lab.adoc[]
+*** xref:s2-mig-slicing-lab-2.adoc[]
diff --git a/modules/ch2-mig/pages/s1-mig-overview-3.adoc b/modules/ch2-mig/pages/s1-mig-overview-3.adoc
@@ -355,241 +355,4 @@ nodeSelector:
 ```
 
 This architecture provides workload placement flexibility while optimizing each pool for its use case.
-====
-
-== MIG Benefits for MaaS
-
-For Models-as-a-Service architectures, MIG provides measurable advantages that directly impact platform economics and operational reliability.
-
-=== Cost Efficiency and ROI
-
-* **Deploy 7 concurrent small models** on single A100-40GB (vs. 1 with full GPU allocation)
-* **Reduce per-model GPU cost from $15,000 to $2,143** (7x cost reduction for `1g.5gb` profiles)
-* **Increase cluster-wide GPU utilization from 33% to 78%** (2.4x improvement over exclusive allocation)
-* **Achieve ROI break-even 2.4x faster** than full GPU deployments (10 months vs. 24 months baseline)
-* **Right-size GPU resources** to actual model requirements, eliminating overprovisioning waste
-
-**Concrete example**: Platform serving 21 small models previously required 21x A100 GPUs ($315,000 capital). With MIG `1g.5gb` profiles, same workload runs on 3x A100 GPUs ($45,000 capital), saving $270,000.
-
-=== Performance Isolation and Predictability
-
-* **Guarantee P99 latency variance <1%** (vs. 15-40% with time-slicing)
-* **Hardware-enforced memory isolation** prevents out-of-memory (OOM) crosstalk between workloads
-* **Dedicated streaming multiprocessors** eliminate compute contention and throttling
-* **Enable SLA-backed inference** with measurable, enforceable performance guarantees
-* **Prevent noisy neighbor problems** through physical resource partitioning
-
-Time-slicing workloads experience latency spikes when concurrent requests arrive. MIG instances maintain consistent latency regardless of neighboring workload activity.
-
-=== Operational Flexibility and Multi-Tenancy
-
-* **Scale model instances independently** without affecting neighboring services (e.g., scale LLaMA-7B from 1 to 3 replicas without reprovisioning hardware)
-* **Mix small and large models** on same physical GPU (e.g., `1g.5gb` microservices alongside `3g.20gb` large language models)
-* **Support true multi-tenant deployments** with hardware isolation between teams
-* **Assign dedicated MIG instances** to specific tenants or namespaces for guaranteed capacity
-* **Enable chargeback and cost allocation** with accurate per-MIG-instance metrics from DCGM
-
-=== Resource Predictability and Capacity Planning
-
-* **Guaranteed GPU resources per deployment**: Each InferenceService gets dedicated SMs and VRAM
-* **Reduce capacity planning variance from ±40% to ±5%**: MIG instances have predictable, fixed resource allocations
-* **Simplify quota management**: Assign 2x `2g.10gb` instances per team, enforceable via Kueue (Chapter 4)
-* **Enable accurate per-model billing**: DCGM provides per-MIG-instance utilization metrics for cost tracking
-* **Predictable failure domains**: OOM or crash in one MIG instance doesn't affect others on same GPU
-
-== Production Considerations and Gotchas
-
-While MIG provides significant benefits, production deployments require careful planning around reconfiguration downtime, workload placement, and ongoing monitoring.
-
-=== Reconfiguration Downtime Planning
-
-Unlike time-slicing (which only requires device plugin pod restart), MIG reconfiguration requires GPU hardware reset and workload migration.
-
-**Downtime Components for Profile Changes**:
-
-1. **Node cordon**: Immediate (prevents new scheduling)
-2. **Node drain**: 5-10 minutes (waiting for workload migration to other nodes)
-3. **MIG mode enablement**: 10-15 seconds (GPU hardware reset)
-4. **Profile application**: 30-60 seconds (instance creation)
-5. **GFD label update**: 30-60 seconds (capability rescan)
-6. **Device plugin restart**: 30-60 seconds (resource rediscovery)
-7. **Node uncordon + scheduling**: 30-60 seconds
-
-**Total**: 10-20 minutes per node for profile changes (per OpenShift documentation)
-
-[WARNING]
-====
-**Plan MIG Profile Changes During Maintenance Windows**
-
-Changing MIG profiles on production nodes serving live traffic causes:
-
-* Immediate termination of all GPU workloads on that node
-* Pod eviction and rescheduling to other nodes
-* Temporary capacity reduction during reconfiguration
-* Potential cascading failures if spare capacity insufficient
-
-**For rolling MIG updates across N nodes**:
-
-* Plan for **N × 20 minutes** total reconfiguration time
-* Ensure **spare GPU capacity** exists for pod rescheduling (recommend N+2 redundancy)
-* Use node cordoning to control blast radius: `oc adm cordon worker-gpu-0`
-* Test profile changes in dev environment first
-* Schedule during low-traffic windows (e.g., weekend maintenance)
-* Document rollback procedure (revert to `all-disabled`, then previous profile)
-====
-
-=== Workload Placement Strategies
-
-MIG instances require **explicit resource requests**. Workloads must request specific MIG profiles when using **mixed** advertisement strategy.
-
-**Correct Resource Request (Mixed Strategy)**:
-
-[source,yaml]
-----
-resources:
-  limits:
-    nvidia.com/mig-2g.10gb: 1  # <1>
-----
-<1> Request specific MIG profile, matches advertised resource name
-
-**Common Mistake**:
-
-[source,yaml]
-----
-resources:
-  limits:
-    nvidia.com/gpu: 1  # <1>
-----
-<1> ❌ Will NOT schedule on MIG-partitioned nodes using mixed strategy
-
-**Consequences of incorrect requests**:
-
-* Pod stuck in `Pending` state with `FailedScheduling` event
-* Error: "Insufficient nvidia.com/gpu" (even though MIG instances available)
-* Requires pod specification update and redeployment
-
-[TIP]
-====
-**Use Admission Controllers for Default MIG Profiles**
-
-For platforms deploying many inference services, create a `MutatingWebhookConfiguration` that automatically injects appropriate MIG resource requests based on pod annotations or namespace labels. This prevents scheduling failures from incorrect resource requests.
-
-**Example implementation**:
-
-* Pods in namespace `small-models` automatically get `nvidia.com/mig-1g.5gb: 1` injected
-* Pods with annotation `mig-profile: medium` get `nvidia.com/mig-2g.10gb: 1`
-* Pods without annotations remain unchanged (for flexibility)
-
-This pattern reduces operational toil and prevents common scheduling errors, especially useful for teams with 50+ inference services.
-====
-
-=== Monitoring MIG Utilization
-
-DCGM (Data Center GPU Manager, deployed in Chapter 1) provides per-MIG-instance metrics. Monitor these to validate your profile choices and identify optimization opportunities.
-
-**Key Metrics to Track**:
-
-* `DCGM_FI_PROF_GR_ENGINE_ACTIVE`: Compute utilization percentage per MIG instance
-* `DCGM_FI_DEV_FB_USED`: Memory usage in bytes per MIG instance
-* `DCGM_FI_DEV_GPU_TEMP`: Temperature per MIG instance (thermal throttling indicator)
-* `DCGM_FI_DEV_POWER_USAGE`: Power consumption per instance
-
-**Red Flags Indicating Misconfigurations**:
-
-* **MIG instance averaging >85% memory usage**: Profile too small, workload may OOM, resize to larger profile
-* **MIG instance averaging <20% compute utilization**: Profile too large, wasted resources, resize to smaller profile
-* **Frequent pod evictions (OOMKilled)**: Memory oversubscription, increase profile memory allocation
-* **High scheduling failure rate**: Insufficient MIG capacity for demand, add nodes or adjust profiles
-
-**Example Prometheus query for MIG memory usage**:
-
-[source,promql]
-----
-DCGM_FI_DEV_FB_USED{GPU_I_ID=~".*"} /
-  DCGM_FI_DEV_FB_TOTAL{GPU_I_ID=~".*"} * 100
-----
-
-You'll build comprehensive MIG monitoring dashboards in Chapter 3.
-
-[IMPORTANT]
-====
-**MIG Instance Naming in Metrics**
-
-DCGM metrics use `GPU_I_ID` label to distinguish MIG instances:
-
-* `GPU=0, GPU_I_ID=0`: First MIG instance on GPU 0
-* `GPU=0, GPU_I_ID=1`: Second MIG instance on GPU 0
-
-This differs from Kubernetes resource names (`nvidia.com/mig-2g.10gb`). Correlation requires mapping GFD labels (showing profile types) to DCGM metrics (showing instance IDs).
-
-**Dashboard design tip**: Group metrics by profile type using GFD label joins to show "all 2g.10gb instances" rather than raw instance IDs.
-====
-
-== Hardware Requirements and GPU Model Support
-
-MIG is available exclusively on NVIDIA Ampere (A-series) and Hopper (H-series) architecture GPUs:
-
-**Supported GPU Models**:
-
-* **NVIDIA A30 (24GB)**: Ampere architecture, up to 4 MIG instances (profiles: 1g.6gb, 2g.12gb, 4g.24gb)
-* **NVIDIA A100-40GB**: Ampere architecture, up to 7 MIG instances (profiles: 1g.5gb through 7g.40gb)
-* **NVIDIA A100-80GB**: Ampere architecture, up to 7 MIG instances (profiles: 1g.10gb through 7g.80gb)
-* **NVIDIA H100 (80GB/94GB)**: Hopper architecture, enhanced MIG with up to 7 instances plus confidential computing per-instance support
-
-**Unsupported GPU Models** (must use time-slicing instead):
-
-* NVIDIA V100 (Volta architecture)
-* NVIDIA T4 (Turing architecture)
-* NVIDIA A10, A16, A40 (Ampere, but consumer/workstation GPUs without MIG feature)
-* NVIDIA L4, L40 (Ada Lovelace architecture)
-
-[NOTE]
-====
-**H100 Enhancements Over A100**
-
-H100 GPUs offer additional MIG capabilities beyond A100:
-
-* **Confidential computing support per MIG instance**: Each instance can run in secure encrypted mode
-* **Improved memory bandwidth allocation**: Better isolation between instances
-* **Faster reconfiguration**: H100 MIG mode changes complete ~15% faster than A100
-
-For most Models-as-a-Service inference workloads, **A100 provides optimal cost/performance balance**. H100 advantages primarily benefit specialized secure computation or extremely memory-bandwidth-intensive workloads. Evaluate H100 premium pricing (~2.5x A100 cost) against actual workload requirements before procurement.
-====
-
-== What's Next
-
-In the next lab section, you will apply the concepts from this overview hands-on, transforming your GPU infrastructure from exclusive allocation to multi-tenant MIG-partitioned platform.
-
-**Lab Activities**:
-
-* **Verify MIG capability** on your A100 GPU nodes using `nvidia-smi`
-* **Set MIG advertisement strategy** (mixed) via ClusterPolicy patch
-* **Label GPU nodes** with built-in MIG profiles (`all-2g.10gb`, `all-balanced`)
-* **Monitor MIG Manager logs** during reconfiguration to observe the 10-20 minute workflow
-* **Verify MIG instance allocation** via `oc describe node` showing `nvidia.com/mig-*` resources
-* **Create custom mig-parted ConfigMap** for heterogeneous profile combinations (e.g., 1x `3g.20gb` + 2x `2g.10gb`)
-* **Deploy test CUDA workloads** requesting specific MIG profiles
-* **Validate hardware isolation** by running `nvidia-smi` inside pods to confirm dedicated resource allocation
-
-**Expected Outcomes**:
-
-* Single A100-40GB will transform from `nvidia.com/gpu: 1` to `nvidia.com/mig-2g.10gb: 3` allocatable resources
-* You'll deploy 3 concurrent CUDA test pods on one physical GPU, each in isolated MIG instances
-* You'll observe <1% latency variance between pods running simultaneously (demonstrating isolation)
-* You'll create the mixed-inference profile from the multi-tenant scenario (1x `3g.20gb` + 2x `2g.10gb`)
-* You'll experience the full MIG reconfiguration workflow including node drain, profile application, and resource verification
-
-**Skills You'll Develop**:
-
-* Label-driven MIG configuration workflow
-* Troubleshooting MIG Manager using logs and node labels
-* Validating MIG instance creation with `nvidia-smi mig -lgi`
-* Correlating Kubernetes resource advertisements with physical MIG partitions
-* Planning MIG profile changes with minimal production disruption
-
-This lab transforms your GPU infrastructure from **1 workload per GPU** (full allocation) to **3-7 workloads per GPU** (MIG partitioning) while maintaining production-grade isolation and predictable performance.
-
-////
-**Maximizing GPU ROI:**
-Understand how the Multi-Instance GPU (MIG) feature splits hardware resources into multiple GPU instances, operating completely isolated from each other. Evaluate MIG advertisement strategies: Single (homogeneous) vs. Mixed (heterogeneous) slicing.
-////
+====