Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions docs/guides/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ To set up a default so all newly created folders and dirs inside or your desired
```

!!! info
For more information read the `setfacl` man page: `man setfacl`.
For more information read the `setfacl` man page: [`man setfacl`](https://linux.die.net/man/1/setfacl).

[](){#ref-guides-storage-lustre}
## Lustre tuning
Expand All @@ -127,14 +127,18 @@ The data itself is subdivided in blocks of size `<blocksize>` and is stored by O
The block size and number of OSTs to use is defined by the striping settings, which are applied to a path, with new files and directories inheriting them from their parent directory.
The `lfs getstripe <path>` command can be used to get information on the stripe settings of a path.
For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout.
The simplest way to have the correct layout is to copy to a directory with the correct layout

Striping settings on a directory are only applied to files added after the command is run.
Existing files retain their original layout unless explicitly changed using `lfs migrate <striping settings>`, which takes the same arguments as `lfs setstripe`.
The simplest way to have the correct layout is to copy to a directory with the correct layout.

!!! tip "A block size of 4MB gives good throughput, without being overly big..."
... so it is a good choice when reading a file sequentially or in large chunks, but if one reads shorter chunks in random order it might be better to reduce the size, the performance will be smaller, but the performance of your application might actually increase.
See the [Lustre documentation](https://doc.lustre.org/lustre_manual.xhtml#managingstripingfreespace) for more information.


!!! example "Settings for large files"
*Remember:* Settings only apply to files added to the directory after this command.
```console
lfs setstripe --stripe-count -1 --stripe-size 4M <big_files_dir>`
```
Expand Down
48 changes: 48 additions & 0 deletions docs/software/communication/nccl-assets/config_v226.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
export NCCL_VERSION=v2.26.2-1
export AWS_OFI_NCCL_VERSION=v1.14.1
export LIBFABRIC_VERSION=v2.2.0
export NCCL_TEST_VERSION=v2.17.1

# ---------------------------------------------------------------------------
#
# Critical Values
#
# ---------------------------------------------------------------------------

export NCCL_NET="AWS Libfabric"
export NCCL_NET_GDR_LEVEL=PHB
export FI_MR_CACHE_MONITOR=userfaultfd
export MPICH_GPU_SUPPORT_ENABLED=0

# Enable the "alternative rendezvous configuration" of Slingshot to avoid
# sporadic, catastrophic drops in performance
export FI_CXI_RDZV_PROTO=alt_read
export SBATCH_NETWORK=disable_rdzv_get

# ---------------------------------------------------------------------------
#
# Recommended Values
#
# ---------------------------------------------------------------------------

export FI_CXI_DEFAULT_CQ_SIZE=131072
export FI_CXI_DEFAULT_TX_SIZE=32768
export FI_CXI_DISABLE_HOST_REGISTER=1
export FI_CXI_RDZV_EAGER_SIZE=0

# ---------------------------------------------------------------------------
#
# Debugging Values
#
# ---------------------------------------------------------------------------

export NCCL_DEBUG=INFO
export NCCL_DEBUG_SUBSYS=INIT,BOOTSTRAP,ENV,TUNING

# ---------------------------------------------------------------------------
#
# Enable CSCS NCCL Tuning Plugin
#
# ---------------------------------------------------------------------------

export NCCL_TUNER_PLUGIN=cscs
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 19 additions & 0 deletions docs/software/communication/nccl-assets/nccl_tuner_v226.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
all_reduce,0,4194303,tree,simple,-1,2,8
all_reduce,4194304,33554431,ring,simple,-1,2,8
all_reduce,33554432,4294967295,tree,simple,-1,2,8
#
all_reduce,0,4194303,tree,simple,-1,4,16
all_reduce,4194304,4294967295,ring,simple,-1,4,16
#
all_reduce,0,33554431,tree,simple,-1,8,32
all_reduce,33554432,4294967295,ring,simple,-1,8,32
#
all_reduce,0,67108863,tree,simple,-1,16,64
all_reduce,67108864,4294967295,ring,simple,-1,16,64
#
all_reduce,0,268435455,tree,simple,-1,32,128
all_reduce,268435456,4294967295,ring,simple,-1,32,128
#
all_reduce,536870912,4294967295,ring,simple,-1,64,256
all_reduce,1073741824,4294967295,ring,simple,-1,128,512
all_reduce,2147483648,4294967295,ring,simple,-1,256,1024
48 changes: 48 additions & 0 deletions docs/software/communication/nccl.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,51 @@
```

If you only set `NCCL_NET="ofi"`, NCCL may silently fail to load the plugin but fall back to the default implementation.

## Expected performance

This section covers the expected performance behavior of the [NCCL Tests benchmark](https://github.com/NVIDIA/nccl-tests) suite on Alps.
This information can be used as a reference for comparing with application behavior.
The [NCCL Stack Constellation Benchmarks](https://github.com/jpcoles-cscs/nccl-stack-constellation-benchmarks) can be used to reproduce this information and also build and run the tests within a user's own environment.

=== "NCCL v2.26"
=== "Plots"
[Download PDF](nccl-assets/nccl-plots-226.pdf)
![NCCL v2.26 benchmark performance](nccl-assets/nccl-plots-226.png)
=== "Environment Settings"
[Download settings](nccl-assets/config_v226.sh)
```bash
--8<-- "docs/software/communication/nccl-assets/config_v226.sh"
```
=== "Tuner parameters"
[Download parameters](nccl-assets/nccl_tuner_v226.conf)
```
--8<-- "docs/software/communication/nccl-assets/nccl_tuner_v226.conf"
```

=== "NCCL v2.27"
=== "NCCL v2.28"

## NCCL Tuner Plugin

NCCL has internal logic to choose the most performant communication algorithm given collective, message size, number of ranks, and other system characteristics.
This logic has been optimized for the infiniband network and can perform suboptimally on the Slinghshot network of Alps.

To achieve best results, it is necessary to use the NCCL Tuner Plugin along side a tuner configuration file.

Check failure on line 100 in docs/software/communication/nccl.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`Slinghshot` is not a recognized word. (unrecognized-spelling)
A modified tuner plugin for Alps is included in a [forked version of NCCL](https://github.com/jpcoles-cscs/nccl).
The forked repository is only needed for building the tuner and is compatible with versions of NCCL >= 2.24 that support the `ncclTunerPlugin_v4` data structure.
CSCS has prepared example configuration files for use in these benchmarks and can be used as a reference point for application-specific tuning.

To use the CSCS tuner, first download, build, and copy the library to a preferred location:
```console
git clone --branch 2.27.7-1-cscs-tuner git@github.com:jpcoles-cscs/nccl.git nccl-tuner-cscs/nccl
cd nccl-tuner-cscs/nccl/ext-tuner/example
make
cp libnccl-tuner-example.so $INSTALL_DIR/libnccl-tuner-cscs.so
```
Then point NCCL to the tuner library:
```bash
export NCCL_TUNER_PLUGIN=$INSTALL_DIR/libnccl-tuner-cscs.so
```


Loading