Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/rock5/rock5b/app-development/ai/rkllm-smolvlm2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
sidebar_position: 27
description: 使用 RKNN 转换Stable Diffusion模型
---

# RKLLM SmolVLM2

import SMOLVLM2 from "../../../../common/ai/\_rkllm_smolvlm2.mdx";

<SMOLVLM2 />
10 changes: 0 additions & 10 deletions docs/rock5/rock5b/app-development/rkllm-smolvlm2.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
This document describes how to use llama.cpp with [KleidiAI](https://www.arm.com/markets/artificial-intelligence/software/kleidi) on Radxa Orion O6 / O6N to accelerate inference for Baidu ERNIE models: [ERNIE-4.5-0.3B](https://huggingface.co/baidu/ERNIE-4.5-0.3B-PT) and [ERNIE-4.5-0.3B-Base](https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT).
This document explains how to run Baidu ERNIE models on the Radxa Orion O6 / O6N using `llama.cpp` with [KleidiAI](https://www.arm.com/markets/artificial-intelligence/software/kleidi) acceleration:
[ERNIE-4.5-0.3B](https://huggingface.co/baidu/ERNIE-4.5-0.3B-PT) and
[ERNIE-4.5-0.3B-Base](https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT).

Model links:

- [ERNIE-4.5-0.3B-PT](https://huggingface.co/baidu/ERNIE-4.5-0.3B-PT)
- [ERNIE-4.5-0.3B-Base-PT](https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT)

## Download models
## Download the model

Radxa provides prebuilt GGUF files: [ERNIE-4.5-0.3B-PT-Q4_0.gguf](https://modelscope.cn/models/radxa/ERNIE-4.5-GGUF/file/view/master/ERNIE-4.5-0.3B-PT-Q4_0.gguf?status=2)
and [ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf](https://modelscope.cn/models/radxa/ERNIE-4.5-GGUF/file/view/master/ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf?status=2). You can download them using `modelscope`.
Radxa provides pre-built GGUF files:
[ERNIE-4.5-0.3B-PT-Q4_0.gguf](https://modelscope.cn/models/radxa/ERNIE-4.5-GGUF/file/view/master/ERNIE-4.5-0.3B-PT-Q4_0.gguf?status=2) and
[ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf](https://modelscope.cn/models/radxa/ERNIE-4.5-GGUF/file/view/master/ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf?status=2).
You can download them with `modelscope`:

<Tabs>
<TabItem value="ERNIE-4.5-0.3B-PT">
Expand Down Expand Up @@ -39,23 +43,23 @@ and [ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf](https://modelscope.cn/models/radxa/ERNIE-

</Tabs>

## Model conversion
## Convert the model (optional)

:::tip
If you are interested in converting models to GGUF, follow this section to perform the conversion on an x86 host.
If you want to convert the model to GGUF yourself, follow this section on an x86 host.

If you do not want to convert models yourself, download the GGUF models provided by Radxa and skip to [**Model inference**](#model-inference).
Otherwise, download the pre-built GGUF from Radxa and skip to [**Inference**](#inference).
:::

### Build llama.cpp

Build llama.cpp on an x86 host.
Build `llama.cpp` on an x86 host.

:::tip
Follow [**llama.cpp**](../../orion/o6/app-development/artificial-intelligence/llama_cpp.md) to build llama.cpp on an x86 host.
Follow [**llama.cpp**](../llama-cpp) to build `llama.cpp` on an x86 host.
:::

Build commands:
Build command:

<NewCodeBlock tip="X86 PC" type="PC">

Expand All @@ -68,9 +72,9 @@ cmake --build build --config Release

</NewCodeBlock>

### Download the model
### Download the source model

Use `modelscope` to download the source model.
Use `modelscope` to download the original model:

<Tabs>
<TabItem value="ERNIE-4.5-0.3B-PT">
Expand Down Expand Up @@ -100,7 +104,7 @@ Use `modelscope` to download the source model.

</Tabs>

### Convert to a floating-point GGUF model
### Convert to a float (F16) GGUF

<Tabs>
<TabItem value="ERNIE-4.5-0.3B-PT">
Expand Down Expand Up @@ -130,11 +134,11 @@ Use `modelscope` to download the source model.

</Tabs>

Running `convert_hf_to_gguf.py` will generate an F16 floating-point GGUF model in the source model directory.
Running `convert_hf_to_gguf.py` generates an F16 (float) GGUF file in the model directory.

### Quantize the GGUF model
### Quantize the GGUF

Use `llama-quantize` to quantize the floating-point GGUF model to Q4_0.
Use `llama-quantize` to quantize the float GGUF to Q4_0:

<Tabs>
<TabItem value="ERNIE-4.5-0.3B-PT">
Expand Down Expand Up @@ -164,17 +168,17 @@ Use `llama-quantize` to quantize the floating-point GGUF model to Q4_0.

</Tabs>

Running `llama-quantize` will generate a GGUF model with the specified quantization in the target directory.
Running `llama-quantize` generates a GGUF file with the selected quantization format in the target path.

## Model inference
## Inference

### Build llama.cpp

:::tip
Follow [**llama.cpp**](../../orion/o6/app-development/artificial-intelligence/llama_cpp.md) to build llama.cpp with **KleidiAI** enabled on Radxa Orion O6 / O6N.
Follow [**llama.cpp**](../llama-cpp) to build `llama.cpp` with **KleidiAI** enabled on the Radxa Orion O6 / O6N.
:::

Build commands:
Build command:

<NewCodeBlock tip="Device" type="device">

Expand All @@ -189,7 +193,7 @@ cmake --build build --config Release

### Run inference

Use `llama-cli` to chat with the model.
Use `llama-cli` to chat with the model:

<Tabs>
<TabItem value="ERNIE-4.5-0.3B-PT">
Expand Down Expand Up @@ -283,7 +287,7 @@ Use `llama-cli` to chat with the model.

</Tabs>

## Performance analysis
## Performance benchmarking

You can use `llama-bench` to benchmark the model.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
This document describes how to use llama.cpp with [KleidiAI](https://www.arm.com/markets/artificial-intelligence/software/kleidi) on Radxa Orion O6 / O6N to accelerate inference for Baidu ERNIE models: [ERNIE-4.5-21B-A3B](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-PT) and [ERNIE-4.5-21B-A3B-Base](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Base-PT).
This document explains how to run Baidu ERNIE models on the Radxa Orion O6 / O6N using `llama.cpp` with [KleidiAI](https://www.arm.com/markets/artificial-intelligence/software/kleidi) acceleration:
[ERNIE-4.5-21B-A3B](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-PT) and
[ERNIE-4.5-21B-A3B-Base](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Base-PT).

Model links:

- [ERNIE-4.5-21B-A3B-PT](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-PT)
- [ERNIE-4.5-21B-A3B-Base-PT](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Base-PT)

## Download models
## Download the model

Radxa provides prebuilt GGUF files: [ERNIE-4.5-21B-A3B-PT-Q4_0.gguf](https://modelscope.cn/models/radxa/ERNIE-4.5-GGUF/file/view/master/ERNIE-4.5-21B-A3B-PT-Q4_0.gguf?status=2)
and [ERNIE-4.5-21B-A3B-Base-PT-Q4_0.gguf](https://modelscope.cn/models/radxa/ERNIE-4.5-GGUF/file/view/master/ERNIE-4.5-21B-A3B-Base-PT-Q4_0.gguf?status=2). You can download them using `modelscope`.
Radxa provides pre-built GGUF files:
[ERNIE-4.5-21B-A3B-PT-Q4_0.gguf](https://modelscope.cn/models/radxa/ERNIE-4.5-GGUF/file/view/master/ERNIE-4.5-21B-A3B-PT-Q4_0.gguf?status=2) and
[ERNIE-4.5-21B-A3B-Base-PT-Q4_0.gguf](https://modelscope.cn/models/radxa/ERNIE-4.5-GGUF/file/view/master/ERNIE-4.5-21B-A3B-Base-PT-Q4_0.gguf?status=2).
You can download them with `modelscope`:

<Tabs>
<TabItem value="ERNIE-4.5-21B-A3B-PT">
Expand Down Expand Up @@ -39,23 +43,23 @@ and [ERNIE-4.5-21B-A3B-Base-PT-Q4_0.gguf](https://modelscope.cn/models/radxa/ERN

</Tabs>

## Model conversion
## Convert the model (optional)

:::tip
If you are interested in converting models to GGUF, follow this section to perform the conversion on an x86 host.
If you want to convert the model to GGUF yourself, follow this section on an x86 host.

If you do not want to convert models yourself, download the GGUF models provided by Radxa and skip to [**Model inference**](#model-inference).
Otherwise, download the pre-built GGUF from Radxa and skip to [**Inference**](#inference).
:::

### Build llama.cpp

Build llama.cpp on an x86 host.
Build `llama.cpp` on an x86 host.

:::tip
Follow [**llama.cpp**](../../orion/o6/app-development/artificial-intelligence/llama_cpp.md) to build llama.cpp on an x86 host.
Follow [**llama.cpp**](../llama-cpp) to build `llama.cpp` on an x86 host.
:::

Build commands:
Build command:

<NewCodeBlock tip="X86 PC" type="PC">

Expand All @@ -68,9 +72,9 @@ cmake --build build --config Release

</NewCodeBlock>

### Download the model
### Download the source model

Use `modelscope` to download the source model.
Use `modelscope` to download the original model:

<Tabs>
<TabItem value="ERNIE-4.5-21B-A3B-PT">
Expand Down Expand Up @@ -101,7 +105,7 @@ Use `modelscope` to download the source model.

</Tabs>

### Convert to a floating-point GGUF model
### Convert to a float (F16) GGUF

<Tabs>
<TabItem value="ERNIE-4.5-21B-A3B-PT">
Expand Down Expand Up @@ -132,11 +136,11 @@ Use `modelscope` to download the source model.

</Tabs>

Running `convert_hf_to_gguf.py` will generate an F16 floating-point GGUF model in the source model directory.
Running `convert_hf_to_gguf.py` generates an F16 (float) GGUF file in the model directory.

### Quantize the GGUF model
### Quantize the GGUF

Use `llama-quantize` to quantize the floating-point GGUF model to Q4_0.
Use `llama-quantize` to quantize the float GGUF to Q4_0:

<Tabs>
<TabItem value="ERNIE-4.5-21B-A3B-PT">
Expand Down Expand Up @@ -166,17 +170,17 @@ Use `llama-quantize` to quantize the floating-point GGUF model to Q4_0.

</Tabs>

Running `llama-quantize` will generate a GGUF model with the specified quantization in the target directory.
Running `llama-quantize` generates a GGUF file with the selected quantization format in the target path.

## Model inference
## Inference

### Build llama.cpp

:::tip
Follow [**llama.cpp**](../../orion/o6/app-development/artificial-intelligence/llama_cpp.md) to build llama.cpp with **KleidiAI** enabled on Radxa Orion O6 / O6N.
Follow [**llama.cpp**](../llama-cpp) to build `llama.cpp` with **KleidiAI** enabled on the Radxa Orion O6 / O6N.
:::

Build commands:
Build command:

<NewCodeBlock tip="Device" type="device">

Expand All @@ -191,7 +195,7 @@ cmake --build build --config Release

### Run inference

Use `llama-cli` to chat with the model.
Use `llama-cli` to chat with the model:

<Tabs>
<TabItem value="ERNIE-4.5-21B-A3B-PT">
Expand Down Expand Up @@ -321,7 +325,7 @@ Use `llama-cli` to chat with the model.

</Tabs>

## Performance analysis
## Performance benchmarking

You can use `llama-bench` to benchmark the model.

Expand Down
Loading