Skip to content

Commit d2e4c39

Browse files
committed
train arguments
1 parent 865af34 commit d2e4c39

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

examples/README.md

+18-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,23 @@
11
## Files
22

3-
- **train.py**: This file contains the model definition, data loading, training loop, and other essential components for training a GNN.
3+
- **train.py**: This file contains the model definition (currently 3 GCN layers with 128 hidden dimension size), data loading, training loop, and other essential components for training a GNN.
4+
This script has various command-line arguments, including:
5+
6+
* `--seed` (optional): Sets the random seed for all GPUs. If not provided, the default seed of 0 is used.
7+
8+
* `--G_intra_r`: Specifies the X dimension of the 3D parallelism configuration (default: 1).
9+
10+
* `--G_intra_c`: Specifies the Y dimension of the 3D parallelism configuration (default: 1).
11+
12+
* `--G_intra_d`: Specifies the Z dimension of the 3D parallelism configuration (default: 1).
13+
14+
* `--gpus_per_node`: Specifies the number of GPUs available on each node.
15+
16+
* `--data_dir`: Specifies the directory containing the dataset, which can be either in unpartitioned or partitioned format.
17+
18+
* `--num_epochs` (optional): Determines the number of training epochs (default: 2).
19+
20+
* `--overlap` (optional): If set (`--overlap`), enables overlapping of SpMM and all-reduce in the aggregation.
421

522
- **run_4.sh**: This is an example shell script for Perlmutter, demonstrating how to run a Plexus-parallelized GNN on 4 GPUs. It includes placeholders that should be replaced with appropriate values for specific experiments, such as dataset path, output directory, etc. The script can be adapted to run on different numbers of GPUs and with different datasets.
623

0 commit comments

Comments
 (0)