Users can currently monopolize node resources by submitting C2D jobs with unrestricted resource allocations, leading to unfair usage and node operator loss of control.
Critical Scenarios:
- 0 GPU + All CPU: User takes all CPU cores without using GPU
- 1 CPU + All RAM: User takes all RAM with minimal CPU usage
- Excessive Resources: No limits on disk or combinations
Impact:
- Single user can block all other users from the node
- Node operators cannot control fair resource distribution
- Inefficient resource utilization
- Poor user experience for queued jobs
- No visibility of limits in Network dashboard
Implement configurable maximum resource limits per job with validation, enforcement, also take in account ON dashboard integration.