Skip to content

Commit 8789a73

Browse files
authored
Merge branch 'ggml-org:master' into jinja-tester
2 parents dcbcb7f + c1c354e commit 8789a73

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+1559
-432
lines changed

common/arg.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2466,7 +2466,7 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
24662466
).set_examples({LLAMA_EXAMPLE_SPECULATIVE, LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_N_CPU_MOE_DRAFT"));
24672467
add_opt(common_arg(
24682468
{"-ngl", "--gpu-layers", "--n-gpu-layers"}, "N",
2469-
"number of layers to store in VRAM",
2469+
string_format("max. number of layers to store in VRAM (default: %d)", params.n_gpu_layers),
24702470
[](common_params & params, int value) {
24712471
params.n_gpu_layers = value;
24722472
if (!llama_supports_gpu_offload()) {

docs/backend/CANN.md

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -293,17 +293,14 @@ We would like to thank Tuo Dai, Shanni Li, and all of the project maintainers fr
293293

294294
## Environment variable setup
295295

296-
### GGML_CANN_ASYNC_MODE
297-
298-
Enables asynchronous operator submission. Disabled by default.
299-
300296
### GGML_CANN_MEM_POOL
301297

302-
Specifies the memory pool management strategy:
298+
Specifies the memory pool management strategy, Default is vmm.
303299

304300
- vmm: Utilizes a virtual memory manager pool. If hardware support for VMM is unavailable, falls back to the legacy (leg) memory pool.
305301

306302
- prio: Employs a priority queue-based memory pool management.
303+
307304
- leg: Uses a fixed-size buffer pool.
308305

309306
### GGML_CANN_DISABLE_BUF_POOL_CLEAN
@@ -312,9 +309,8 @@ Controls automatic cleanup of the memory pool. This option is only effective whe
312309

313310
### GGML_CANN_WEIGHT_NZ
314311

315-
Converting the matmul weight format from ND to NZ can significantly improve performance on the 310I DUO NPU.
312+
Converting the matmul weight format from ND to NZ to improve performance. Enabled by default.
316313

317-
### GGML_CANN_DISABLE_ACL_GRAPH
314+
### GGML_CANN_ACL_GRAPH
318315

319-
When this variable is set, ACL graph execution is disabled and operators are executed in an op-by-op (eager) mode.
320-
This mode is mainly intended for debugging or for cases where the overhead of graph construction and execution is not desirable.
316+
Operators are executed using ACL graph execution, rather than in op-by-op (eager) mode. Enabled by default.

examples/model-conversion/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ causal-verify-logits: causal-run-original-model causal-run-converted-model
6363
@MODEL_PATH="$(MODEL_PATH)" ./scripts/utils/check-nmse.py -m ${MODEL_PATH}
6464

6565
causal-run-original-embeddings:
66-
@./scripts/causal/run-casual-gen-embeddings-org.sh
66+
@./scripts/causal/run-casual-gen-embeddings-org.py
6767

6868
causal-run-converted-embeddings:
6969
@./scripts/causal/run-converted-model-embeddings-logits.sh

examples/model-conversion/scripts/causal/compare-embeddings-logits.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#/bin/bash
1+
#!/usr/bin/env bash
22

33
set -e
44

examples/model-conversion/scripts/causal/convert-model.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22

33
set -e
44

examples/model-conversion/scripts/causal/run-casual-gen-embeddings-org.sh renamed to examples/model-conversion/scripts/causal/run-casual-gen-embeddings-org.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,10 @@
33
import argparse
44
import os
55
import importlib
6-
import sys
76
import torch
87
import numpy as np
98

10-
from transformers import AutoTokenizer, AutoConfig, AutoModel, AutoModelForCausalLM
9+
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM
1110
from pathlib import Path
1211

1312
unreleased_model_name = os.getenv('UNRELEASED_MODEL_NAME')
@@ -43,6 +42,8 @@
4342
model = model_class.from_pretrained(model_path)
4443
except (ImportError, AttributeError) as e:
4544
print(f"Failed to import or load model: {e}")
45+
print("Falling back to AutoModelForCausalLM")
46+
model = AutoModelForCausalLM.from_pretrained(model_path)
4647
else:
4748
model = AutoModelForCausalLM.from_pretrained(model_path)
4849
print(f"Model class: {type(model)}")

examples/model-conversion/scripts/causal/run-converted-model-embeddings-logits.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22

33
set -e
44

examples/model-conversion/scripts/causal/run-converted-model.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22

33
set -e
44

examples/model-conversion/scripts/embedding/compare-embeddings-logits.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#/bin/bash
1+
#!/usr/bin/env bash
22

33
set -e
44

examples/model-conversion/scripts/embedding/convert-model.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22

33
set -e
44

0 commit comments

Comments
 (0)