Skip to content

Commit f96793e

Browse files
Sync documentation updates
1 parent 1391c58 commit f96793e

File tree

1 file changed

+10
-20
lines changed

1 file changed

+10
-20
lines changed

serverless/endpoints/model-caching.mdx

Lines changed: 10 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -61,41 +61,31 @@ flowchart TD
6161

6262
## Where models are stored
6363

64-
Cached models are stored on the worker container's local disk, separate from any attached network volumes. Runpod automatically manages this internal storage to optimize loading speed.
64+
Cached models are stored in a Runpod-managed Docker volume and mounted at `/runpod-volume/huggingface-cache/hub/`. This creates a "blended view" where you can see both your network volume contents and cached models under the same `/runpod-volume/` path.
6565

66-
The cache persists across requests on the same worker, so once a worker initializes, you'll see consistent performance. Since the models live on local disk rather than network volumes, they won't appear on your attached network volumes.
66+
The model cache loads significantly faster than network volumes, reducing cold start times. The cache is automatically managed and persists across requests on the same worker. You'll see cached models overlaid onto your network volume mount point.
6767

68-
## Accessing cached models
68+
## Accessing cached models in your application
6969

70-
Cached models are stored at `/runpod-volume/huggingface-cache/hub/`. The directory structure follows Hugging Face cache conventions, where forward slashes (`/`) in the model name are replaced with double dashes (`--`).
70+
Runpod caches models at `/runpod-volume/huggingface-cache/hub/` following Hugging Face cache conventions. The directory structure replaces forward slashes (`/`) from the original model name with double dashes (`--`), and includes a version hash subdirectory.
7171

7272
The path structure follows this pattern:
7373

7474
```
75-
/runpod-volume/huggingface-cache/hub/models--{organization}--{model-name}/
75+
/runpod-volume/huggingface-cache/hub/models--{organization}--{model-name}/snapshots/{version-hash}/
7676
```
7777

78-
For example, `meta-llama/Llama-3.2-1B-Instruct` would be stored at:
78+
For example, the model `gensyn/qwen2.5-0.5b-instruct` would be stored at:
7979

8080
```
81-
/runpod-volume/huggingface-cache/hub/models--meta-llama--Llama-3.2-1B-Instruct/
81+
/runpod-volume/huggingface-cache/hub/models--gensyn--qwen2.5-0.5b-instruct/snapshots/317b7eb96312eda0c431d1dab1af958a308cb35e/
8282
```
8383

84-
## Using cached models in applications
84+
### Current limitations
8585

86-
You can access cached models in your application two ways:
86+
The version hash in the path currently prevents direct integration with some applications (like ComfyUI worker) that expect to predict paths based solely on model name. We're working on removing the version hash requirement.
8787

88-
**Direct configuration**: Configure your application to load models directly from `/runpod-volume/huggingface-cache/hub/`. Many frameworks and tools let you specify a custom cache directory for Hugging Face models.
89-
90-
**Symbolic links**: Create symbolic links from your application's expected model directory to the cache location. This is particularly useful for applications like ComfyUI that expect models in specific directories.
91-
92-
For example, create a symbolic link like this:
93-
94-
```bash
95-
ln -s /runpod-volume/huggingface-cache/hub/models--meta-llama--Llama-3.2-1B-Instruct/ /workspace/models/llama-3.2
96-
```
97-
98-
This lets your application access cached models without modifying its configuration.
88+
If your application requires specific paths, configure it to scan `/runpod-volume/huggingface-cache/hub/` for models.
9989

10090
## Enabling cached models
10191

0 commit comments

Comments
 (0)