From 642682da9f139f49b48c7627aa6c9e3a019b6804 Mon Sep 17 00:00:00 2001
From: Wessel Bruinsma <wessel.p.bruinsma@gmail.com>
Date: Sat, 15 Nov 2025 20:46:29 +0100
Subject: [PATCH] Configure memory allocator

---
 docs/finetuning.md | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/docs/finetuning.md b/docs/finetuning.md
index 41af477..b2cb210 100644
--- a/docs/finetuning.md
+++ b/docs/finetuning.md
@@ -10,7 +10,7 @@ model = AuroraPretrained()
 model.load_checkpoint()
 ```
 
-## Basic Fine-Tuning Environment
+## Fine-Tuning Environment
 
 We provide a very basic Docker image and fine-tuning loop to get you started.
 This Docker image is built from a NVIDIA PyTorch base image,
@@ -30,10 +30,13 @@ docker run --rm -it -v .:/app/aurora \
 Then, within the image, execute
 
 ```bash
-python finetuning/finetune.py
+PYTORCH_CUDA_ALLOC_CONF=backend:cudaMallocAsync \
+    python finetuning/finetune.py
 ```
 
 to run the sample fine-tuning loop.
+`PYTORCH_CUDA_ALLOC_CONF=backend:cudaMallocAsync` enables CUDA's built-in
+asynchronous memory allocator, which is recommended for Aurora.
 This loop should run on an A100 with 80 GB of memory.
 If you need to reduce memory usage, you could try the following:
 (a) split the model and optimiser parameters across multiple GPUs with