llama_decode_eagle latency issue solved (#5) #15

Workflow file for this run

	name: CI

	on:
	workflow_dispatch: # allows manual triggering
	push:
	branches:
	- master
	paths: ['.github/workflows/build.yml', '/CMakeLists.txt', '/.cmake', '*/.h', '*/.hpp', '*/.c', '*/.cpp', '*/.cu', '*/.cuh', '*/.swift', '*/.m', '*/.metal', '*/.comp']
	pull_request:
	types: [opened, synchronize, reopened]
	paths: ['.github/workflows/build.yml', '/CMakeLists.txt', '/.cmake', '*/.h', '*/.hpp', '*/.c', '*/.cpp', '*/.cu', '*/.cuh', '*/.swift', '*/.m', '*/.metal', '*/.comp']

	concurrency:
	group: ${{ github.workflow }}-${{ github.head_ref && github.ref \|\| github.run_id }}
	cancel-in-progress: true

	env:
	GGML_NLOOP: 3
	GGML_N_THREADS: 1
	LLAMA_LOG_COLORS: 1
	LLAMA_LOG_PREFIX: 1
	LLAMA_LOG_TIMESTAMPS: 1

	jobs:
	ubuntu-cpu-cmake:
	strategy:
	matrix:
	include:
	- build: 'x64'
	os: ubuntu-22.04
	- build: 'arm64'
	os: ubuntu-22.04-arm

	runs-on: ${{ matrix.os }}

	steps:
	- name: Clone
	id: checkout
	uses: actions/checkout@v4

	- name: ccache
	uses: hendrikmuhs/ccache-action@v1.2.16
	with:
	key: ubuntu-cpu-cmake
	evict-old-files: 1d

	- name: Dependencies
	id: depends
	run: \|
	sudo apt-get update
	sudo apt-get install build-essential libcurl4-openssl-dev

	- name: Build
	id: cmake_build
	run: \|
	cmake -B build \
	-DLLAMA_FATAL_WARNINGS=ON \
	-DGGML_RPC=ON
	cmake --build build --config Release -j $(nproc)

	- name: Test
	id: cmake_test
	run: \|
	cd build
	ctest -L 'main\|curl' --verbose --timeout 900

	- name: Test llama2c conversion
	id: llama2c_test
	run: \|
	cd build
	echo "Fetch tokenizer"
	wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories260K/tok512.bin
	echo "Fetch llama2c model"
	wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories260K/stories260K.bin
	./bin/llama-convert-llama2c-to-ggml --copy-vocab-from-model ./tok512.bin --llama2c-model stories260K.bin --llama2c-output-model stories260K.gguf
	./bin/llama-cli -m stories260K.gguf -p "One day, Lily met a Shoggoth" -n 500 -c 256

	ubuntu-latest-cmake-cuda:
	runs-on: ubuntu-latest
	container: nvidia/cuda:12.6.2-devel-ubuntu24.04

	steps:
	- name: Clone
	id: checkout
	uses: actions/checkout@v4

	- name: Install dependencies
	env:
	DEBIAN_FRONTEND: noninteractive
	run: \|
	apt update
	apt install -y cmake build-essential ninja-build libgomp1 git libcurl4-openssl-dev

	- name: ccache
	uses: hendrikmuhs/ccache-action@v1.2.16
	with:
	key: ubuntu-latest-cmake-cuda
	evict-old-files: 1d

	- name: Build with CMake
	run: \|
	cmake -S . -B build -G Ninja \
	-DCMAKE_BUILD_TYPE=Release \
	-DCMAKE_CUDA_ARCHITECTURES=89-real \
	-DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined \
	-DLLAMA_FATAL_WARNINGS=ON \
	-DGGML_NATIVE=OFF \
	-DGGML_CUDA=ON
	cmake --build build

	android-build:
	runs-on: ubuntu-latest

	steps:
	- name: Clone
	uses: actions/checkout@v4

	- name: ccache
	uses: hendrikmuhs/ccache-action@v1.2.16
	with:
	key: android-build
	evict-old-files: 1d

	- name: Set up JDK
	uses: actions/setup-java@v3
	with:
	java-version: 17
	distribution: zulu

	- name: Setup Android SDK
	uses: android-actions/setup-android@v3
	with:
	log-accepted-android-sdk-licenses: false

	- name: Build
	run: \|
	cd examples/llama.android
	./gradlew build --no-daemon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama_decode_eagle latency issue solved (#5) #15

Workflow file

llama_decode_eagle latency issue solved (#5) #15

Uh oh!

Workflow file for this run