-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Description
Run benchmarks with 2 gpus, and compare with ./processInfo -pid 203639 and nvidia-smi.
'GPU ID' from ./processInfo -pid 203639 is GPU-0, GPU-0. But in nvidia-smi is GPU-0, GPU-1.
python3 ./benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \
--forward_only \
--batch_size=16 \
--model=resnet50 \
--num_gpus=2 \
--num_batches=500000 \
--num_warmup_batches=10 \
--data_name=imagenet \
--allow_growth=Trueroot@k8s-node1:~/go-dcgm/samples/processInfo# ./processInfo -pid 203639
2024/04/07 11:51:51 Enabling DCGM watches to start collecting process stats. This may take a few seconds....
----------------------------------------------------------------------
GPU ID : 0
----------Execution Stats---------------------------------------------
PID : 203639
Name : tf_cnn_benchmar
Start Time : 2024-04-03 20:29:37 +0800 CST
End Time : Running
----------Performance Stats-------------------------------------------
Energy Consumed (Joules) : 0
Max GPU Memory Used (bytes) : 5453643776
Avg SM Clock (MHz) : 1590
Avg Memory Clock (MHz) : 5000
Avg SM Utilization (%) : 21
Avg Memory Utilization (%) : 16
Avg PCIe Rx Bandwidth (MB) : 9223372036854775792
Avg PCIe Tx Bandwidth (MB) : 9223372036854775792
----------Event Stats-------------------------------------------------
Single Bit ECC Errors : N/A
Double Bit ECC Errors : N/A
Critical XID Errors : 0
----------Slowdown Stats----------------------------------------------
Due to - Power (%) : 0
- Thermal (%) : 0
- Reliability (%) : 9223372036854775792
- Board Limit (%) : 9223372036854775792
- Low Utilization (%) : 9223372036854775792
- Sync Boost (%) : 0
----------Process Utilization-----------------------------------------
Avg SM Utilization (%) : 48
Avg Memory Utilization (%) : 38
----------------------------------------------------------------------
----------------------------------------------------------------------
GPU ID : 0
----------Execution Stats---------------------------------------------
PID : 203639
Name : tf_cnn_benchmar
Start Time : 2024-04-03 20:29:37 +0800 CST
End Time : Running
----------Performance Stats-------------------------------------------
Energy Consumed (Joules) : 0
Max GPU Memory Used (bytes) : 227540992
Avg SM Clock (MHz) : 585
Avg Memory Clock (MHz) : 5000
Avg SM Utilization (%) : N/A
Avg Memory Utilization (%) : N/A
Avg PCIe Rx Bandwidth (MB) : 9223372036854775792
Avg PCIe Tx Bandwidth (MB) : 9223372036854775792
----------Event Stats-------------------------------------------------
Single Bit ECC Errors : N/A
Double Bit ECC Errors : N/A
Critical XID Errors : 0
----------Slowdown Stats----------------------------------------------
Due to - Power (%) : 0
- Thermal (%) : 0
- Reliability (%) : 9223372036854775792
- Board Limit (%) : 9223372036854775792
- Low Utilization (%) : 9223372036854775792
- Sync Boost (%) : 0
----------Process Utilization-----------------------------------------
Avg SM Utilization (%) : 0
Avg Memory Utilization (%) : 0
----------------------------------------------------------------------
root@k8s-node1:~/go-dcgm/samples/processInfo# nvidia-smi
Sun Apr 7 11:52:05 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:07.0 Off | 0 |
| N/A 64C P0 71W / 70W | 5204MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:00:08.0 Off | 0 |
| N/A 43C P0 27W / 70W | 220MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 203639 C python3 5201MiB |
| 1 N/A N/A 203639 C python3 217MiB |
+-----------------------------------------------------------------------------+
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels