- STAlloc is a memory tool used to reduce memory fragmentation and improve memory utilization.
- STAlloc actively leverages the predictable memory allocation pattern of large model training to perform ahead-of-time memory allocation planning.
cd Allocator && make- See
Allocator/README.mdfor more details
| Env | Value | Description |
|---|---|---|
STALLOC_MODE |
[Torch (default), Trace, Alloc] |
set the mode of STAlloc. |
STALLOC_LIB_PATH |
path to STAlloc lib, required when using STALLOC_MODE is Trace or Alloc. |
|
STALLOC_MODEL_INFO_PATH |
path to save model memory info, required when STALLOC_MODE is Trace or Alloc. |
|
STALLOC_DYNAMIC |
[0 (default), 1] |
required in MoE model(without batchgemm) when STALLOC_MODE is Trace or Alloc. |
STALLOC_LOG_LEVEL |
[0, 1, 2, 3 (default)] |
set the log level of STAlloc, the smaller the value, the more detailed the output content. |
STALLOC_STATIC_FALLBACK |
[0 (default), 1] |
enable fallback in static-alloc, which will affect the performance |
STALLOC_TRACE_FAST_MODE |
[0 (default), 1] |
use faster dynamic-allocator to trace, but this may lead to OOM, when the memory required by the model has reached the limit of GPU. |
"""
At the beginning of pretrain.py, import stalloc
"""
from stalloc.utils.hook_model import hook_memory_model
def model_provider(...):
...
#return model
return hook_memory_model(model, args)Add the following code to train script.
export STALLOC_MODE=Trace
export STALLOC_TRACE_FAST_MODE=1 # may lead to OOM
STALLOC_PATH=YourPath
export STALLOC_LIB_PATH=${STALLOC_PATH}/Allocator
MODEL_TAG=llama3-70b-tp8pp8mbs1gbs128-node${RANK}
MEMORY_SAVED_DIR=/workspace/allocator_case
export STALLOC_MODEL_INFO_PATH=${MEMORY_SAVED_DIR}/${MODEL_TAG}
if [ "$STALLOC_MODE" == "Trace" ]; then
if [ -e "${STALLOC_MODEL_INFO_PATH}/trace" ]; then
rm -rf ${STALLOC_MODEL_INFO_PATH}/trace
fi
mkdir -p ${STALLOC_MODEL_INFO_PATH}/trace
mkdir -p ${STALLOC_MODEL_INFO_PATH}/output
elif [ "$STALLOC_MODE" == "Alloc" ]; then
export STALLOC_LOG_LEVEL=1
if [ ! -e "${STALLOC_MODEL_INFO_PATH}/output/plan" ]; then
exit 1
fi
fi
# !!! If you set "STALLOC_MODE=Trace", please make sure that "train-iter=3" and "eval-iter=1".- After preparation work is done, run the memory tools by following steps.
- Set
export STALLOC_MODE=Torchand run the training script. - Check if the GPU memory fragmentation of torch is severe.
- Find this form of line in log :
dev0 : max_reserved:xx.xx, max_allocated:xx.xx, utilization:xx.xx% - The smaller the
utilization, the more severe of the fragmented storage. - It is generally considered that the fragmentation problem is severe when
utilizationis less than 90%.
- Find this form of line in log :
- OOM in Torch Mode does not mean that Trace will also OOM.
- Set
export STALLOC_MODE=Traceand run the training script.
- Generate Plan. run the following command
cd ${STALLOC_PATH}/Synthesizerpython main.py -model-memory-dir=XXXX- exec
python mian.py --helpfor more details
- Set
export STALLOC_MODE=Allocand run the training script.
If you are using other version of Megatron-LM, you may need to check the path of patch functions, and modify in utils/memory_patcher.py.
This project is licensed under the Apache License 2.0.