Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 7 additions & 9 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ For more info, please refer to the [AGENTS.md](AGENTS.md) file.

Before submitting your PR:
- Search for existing PRs to prevent duplicating efforts
- cheese.cpp uses the ggml tensor library for model evaluation. If you are unfamiliar with ggml, consider taking a look at the [examples in the ggml repository](https://github.com/ggml-org/ggml/tree/master/examples/). [simple](https://github.com/ggml-org/ggml/tree/master/examples/simple) shows the bare minimum for using ggml. [gpt-2](https://github.com/ggml-org/ggml/tree/master/examples/gpt-2) has minimal implementations for language model inference using GPT-2. [mnist](https://github.com/ggml-org/ggml/tree/master/examples/mnist) demonstrates how to train and evaluate a simple image classifier
- Cheesebrain uses the `ggml` tensor library for model evaluation. If you are unfamiliar with `ggml`, consider taking a look at the [examples in the ggml repository](https://github.com/ggml-org/ggml/tree/master/examples/). [`simple`](https://github.com/ggml-org/ggml/tree/master/examples/simple) shows the bare minimum for using `ggml`. [`gpt-2`](https://github.com/ggml-org/ggml/tree/master/examples/gpt-2) has minimal implementations for language model inference using GPT-2. [`mnist`](https://github.com/ggml-org/ggml/tree/master/examples/mnist) demonstrates how to train and evaluate a simple image classifier
- Test your changes:
- Execute [the full CI locally on your machine](ci/README.md) before publishing
- Verify that the perplexity and the performance are not affected negatively by your changes (use `cheese-perplexity` and `cheese-bench`)
Expand All @@ -41,7 +41,7 @@ Before submitting your PR:
- Consider allowing write access to your branch for faster reviews, as reviewers can push commits directly

After submitting your PR:
- Expect requests for modifications to ensure the code meets cheese.cpp's standards for quality and long-term maintainability
- Expect requests for modifications to ensure the code meets Cheesebrain's standards for quality and long-term maintainability
- Maintainers will rely on your insights and approval when making a final decision to approve and merge a PR
- If your PR becomes stale, rebase it on top of latest `master` to get maintainers attention
- Consider adding yourself to [CODEOWNERS](CODEOWNERS) to indicate your availability for fixing related issues and reviewing related PRs
Expand All @@ -50,7 +50,7 @@ After submitting your PR:

- Squash-merge PRs
- Use the following format for the squashed commit title: `<module> : <commit title> (#<issue_number>)`. For example: `utils : fix typo in utils.py (#1234)`
- Optionally pick a `<module>` from here: https://github.com/ggml-org/cheese.cpp/wiki/Modules
- Optionally pick a `<module>` that best matches the area you touched (e.g. `ggml`, `tools/server`, `tools/cli`, `tests`, etc.).
- Let other maintainers merge their own PRs
- When merging a PR, make sure you have a good understanding of the changes
- Be mindful of maintenance: most of the work going into a feature happens after the PR is merged. If the PR author is not committed to contribute long-term, someone else needs to take responsibility (you)
Expand Down Expand Up @@ -80,7 +80,7 @@ Maintainers reserve the right to decline review or close pull requests for any r
const enum cheese_rope_type rope_type;
```

_(NOTE: this guideline is yet to be applied to the `cheese.cpp` codebase. New code should follow this guideline.)_
_(NOTE: this guideline is yet to be applied consistently to the Cheesebrain codebase. New code should follow this guideline.)_

- Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` (from clang-tools v15+) to format the added code
- For error handling (when to use return codes vs exceptions, C API failure semantics), see [Error handling](docs/error_handling.md)
Expand Down Expand Up @@ -142,7 +142,7 @@ Maintainers reserve the right to decline review or close pull requests for any r
enum cheese_pooling_type cheese_pooling_type(const cheese_context_t ctx);
```

_(NOTE: this guideline is yet to be applied to the `cheese.cpp` codebase. New code should follow this guideline)_
_(NOTE: this guideline is yet to be applied consistently to the Cheesebrain codebase. New code should follow this guideline.)_

- C/C++ filenames are all lowercase with dashes. Headers use the `.h` extension. Source files use the `.c` or `.cpp` extension
- Python filenames are all lowercase with underscores
Expand All @@ -168,7 +168,7 @@ Maintainers reserve the right to decline review or close pull requests for any r
- When adding or modifying a large piece of code:
- If you are a collaborator, make sure to add yourself to [CODEOWNERS](CODEOWNERS) to indicate your availability for reviewing related PRs
- If you are a contributor, find an existing collaborator who is willing to review and maintain your code long-term
- Provide the necessary CI workflow (and hardware) to test your changes (see [ci/README.md](https://github.com/ggml-org/cheese.cpp/tree/master/ci))
- Provide the necessary CI workflow (and hardware) to test your changes (see [ci/README.md](ci/README.md))

- New code should follow the guidelines (coding, naming, etc.) outlined in this document. Exceptions are allowed in isolated, backend-specific parts of the code that do not interface directly with the `ggml` interfaces.
_(NOTE: for legacy reasons, existing code is not required to follow this guideline)_
Expand All @@ -181,6 +181,4 @@ Maintainers reserve the right to decline review or close pull requests for any r

# Resources

The Github issues, PRs and discussions contain a lot of information that can be useful to get familiar with the codebase. For convenience, some of the more important information is referenced from Github projects:

https://github.com/ggml-org/cheese.cpp/projects
The GitHub issues, PRs and discussions in this repository contain a lot of information that can be useful to get familiar with the codebase. Browse open and closed issues and pull requests to see common patterns and prior design decisions.
84 changes: 48 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,78 @@
# Cheese Brain
# Cheesebrain

![Cheese Hero](media/cheese0.png)
![Cheesebrain Hero](media/cheese0.png)

[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Build Status](https://github.com/ggml-org/cheese.cpp/actions/workflows/build.yml/badge.svg)](https://github.com/ggml-org/cheese.cpp/actions/workflows/build.yml)
[![Stars](https://img.shields.io/github/stars/ggml-org/cheese.cpp.svg)](https://github.com/ggml-org/cheese.cpp/stargazers)
[![Forks](https://img.shields.io/github/forks/ggml-org/cheese.cpp.svg)](https://github.com/ggml-org/cheese.cpp/network/members)
[![Issues](https://img.shields.io/github/issues/ggml-org/cheese.cpp.svg)](https://github.com/ggml-org/cheese.cpp/issues)
[![Pull Requests](https://img.shields.io/github/issues-pr/ggml-org/cheese.cpp.svg)](https://github.com/ggml-org/cheese.cpp/pulls)

**Cheese Brain** is a state-of-the-art C/C++ library for high-performance Large Language Model (LLM) inference. Designed for efficiency, portability, and ease of use, Cheese enables you to run the latest AI models on anything from a laptop to a high-end server with minimal setup.
**Cheesebrain** is a high-performance C/C++ runtime for Large Language Model (LLM) inference. It is designed to be small, fast, and self-contained so you can run modern GGUF models on laptops, workstations, and servers with minimal setup.

Cheesebrain is its own project and is not described as “built from cheese.cpp”. It focuses on:

- **Portability** – a single codebase that runs well on Linux, macOS, and Windows.
- **Performance** – tight low-level code with aggressive quantization support and hardware-aware kernels.
- **Practical tooling** – CLI, HTTP server, Web UI, quantization tools, and model conversion utilities.

## Key Features

- **Pure C/C++**: Zero-dependency implementation for maximum portability.
- **Hardware Optimized**:
- **Apple Silicon**: First-class support via ARM NEON, Accelerate, and Metal.
- **x86**: Optimized for AVX, AVX2, AVX512, and AMX.
- **NVIDIA**: High-performance CUDA kernels for discrete GPUs.
- **Efficiency**: Advanced 1.5-bit to 8-bit quantization for reduced memory footprint and lightning-fast execution.
- **Extensible**: Supports a wide range of state-of-the-art models via GGUF format compatibility.
- **C / C++ implementation** for easy integration into existing systems.
- **Hardware-optimized backends**:
- Apple Silicon (NEON, Accelerate, Metal)
- x86 (SSE/AVX/AVX2/AVX-512/AMX where available)
- Optional GPU backends (CUDA / Metal / others, depending on build flags)
- **Quantization-aware**: supports multiple GGUF quantization schemes to reduce memory and improve throughput.
- **Rich tooling**:
- `cheese-cli` for interactive and scripted use.
- `cheese-server` for an OpenAI-compatible HTTP API (with optional Web UI).
- Quantization, benchmarking, and conversion helpers under `tools/`.

## Quick Start

Cheese is designed to be productive from minute one.
### Build

### Installation
Build from source for the best performance on your specific hardware:
From the repository root:

```bash
cmake -B build
cmake --build build --config Release
```

### Usage
Run your first model in seconds:
This produces binaries under `build/bin/`.

### Run a model

Assuming you have a GGUF model at `./models/model.gguf`:

```bash
# Chat with a model
./build/bin/cheese-cli -m model.gguf -cnv
# Chat from the terminal
./build/bin/cheese-cli -m ./models/model.gguf -cnv

# Start an OpenAI-compatible API server (no UI by default; add --webui for the web UI)
./build/bin/cheese-server -m model.gguf --port 8080
# Start an OpenAI-compatible HTTP server (add --webui for the Web UI)
./build/bin/cheese-server -m ./models/model.gguf --port 8080
```

## Description

The core mission of **cheese.cpp** is to democratize AI by making high-performance inference accessible on everyday hardware. By leveraging the power of `ggml`, Cheese provides a playground for cutting-edge LLM features, ensuring you always have access to the latest advancements in the field.
The server exposes `/v1/chat/completions`, `/v1/completions`, `/v1/embeddings`, and related endpoints.

## Documentation

Explore our comprehensive guides to get the most out of Cheese:
See the in-repo docs for details:

- [Installation & Build Guide](docs/build.md)
- [Command Line Interface (CLI)](tools/cli/README.md)
- [HTTP API Server](tools/server/README.md)
- [Quantization Guide](tools/quantize/README.md)
- [Build & configuration](docs/build.md)
- [CLI usage](tools/cli/README.md)
- [HTTP server & Web UI](tools/server/README.md)
- [Quantization](tools/quantize/README.md)

## Performance
## Performance Tips

To get the best speed: build with a backend that matches your hardware (see [Build](docs/build.md): BLAS for CPU prefill, CUDA/Metal/SYCL for GPU). At runtime, tune threads (`-t`), GPU layers (`-ngl`), and batch size (`-ub`). See [Token generation performance tips](docs/development/token_generation_performance_tips.md) for CPU/GPU tuning and [Server README](tools/server/README.md#performance) for server throughput. To measure speed, see [Benchmarking](docs/development/benchmarking.md) and `scripts/bench-perf.sh`.
To get the best performance on your machine:

---
- Build with an appropriate backend (BLAS for CPU, CUDA/Metal/SYCL where applicable).
- Tune runtime flags:
- Threads: `-t N`
- GPU layers: `-ngl N`
- Batch size / ubatch: `-ub N`
- See:
- [Token generation performance tips](docs/development/token_generation_performance_tips.md)
- [Server performance notes](tools/server/README.md#performance)
- [Benchmarking guide](docs/development/benchmarking.md)

*Cheese.cpp - The most delicious way to run LLMs.*
Cheesebrain aims to be a pragmatic, low-friction way to run GGUF models locally while remaining small and hackable.
14 changes: 7 additions & 7 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
- [**Reporting a vulnerability**](#reporting-a-vulnerability)
- [**Requirements**](#requirements)
- [**Covered Topics**](#covered-topics)
- [**Using cheese.cpp securely**](#using-cheesecpp-securely)
- [**Using Cheesebrain securely**](#using-cheesebrain-securely)
- [Untrusted models](#untrusted-models)
- [Untrusted inputs](#untrusted-inputs)
- [Data privacy](#data-privacy)
Expand All @@ -14,7 +14,7 @@

If you have discovered a security vulnerability in this project that falls inside the [covered topics](#covered-topics), please report it privately. **Do not disclose it as a public issue.** This gives us time to work with you to fix the issue before public exposure, reducing the chance that the exploit will be used before a patch is released.

Please disclose it as a private [security advisory](https://github.com/ggml-org/cheese.cpp/security/advisories/new).
Please disclose it via your repository host's private security advisory mechanism (for example, GitHub Security Advisories).

A team of volunteers on a reasonable-effort basis maintains this project. As such, please give us at least 90 days to work on a fix before public exposure.

Expand Down Expand Up @@ -44,11 +44,11 @@ Only vulnerabilities that fall within these parts of the project are considered
- Features not recommended for use in untrusted environments (e.g., router, MCP)
- Bugs that can lead to Denial-of-Service attack

Note that none of the topics under [Using cheese.cpp securely](#using-cheesecpp-securely) are considered vulnerabilities in Cheese C++.
Note that none of the topics under [Using Cheesebrain securely](#using-cheesebrain-securely) are considered vulnerabilities in Cheesebrain itself.

For vulnerabilities that fall within the `vendor` directory, please report them directly to the third-party project.
For vulnerabilities that fall within the `vendor` or `third_party` directories, please report them directly to the corresponding upstream project when possible.

## Using cheese.cpp securely
## Using Cheesebrain securely

### Untrusted models
Be careful when running untrusted models. This classification includes models created by unknown developers or utilizing data obtained from unknown sources.
Expand All @@ -66,7 +66,7 @@ For maximum security when handling untrusted inputs, you may need to employ the

* Sandboxing: Isolate the environment where the inference happens.
* Pre-analysis: Check how the model performs by default when exposed to prompt injection (e.g. using [fuzzing for prompt injection](https://github.com/FonduAI/awesome-prompt-injection?tab=readme-ov-file#tools)). This will give you leads on how hard you will have to work on the next topics.
* Updates: Keep both Cheese C++ and your libraries updated with the latest security patches.
* Updates: Keep both Cheesebrain and your libraries updated with the latest security patches.
* Input Sanitation: Before feeding data to the model, sanitize inputs rigorously. This involves techniques such as:
* Validation: Enforce strict rules on allowed characters and data types.
* Filtering: Remove potentially malicious scripts or code fragments.
Expand All @@ -80,7 +80,7 @@ To protect sensitive data from potential leaks or unauthorized access, it is cru
### Untrusted environments or networks

If you can't run your models in a secure and isolated environment or if it must be exposed to an untrusted network, make sure to take the following security precautions:
* Do not use the RPC backend, [rpc-server](https://github.com/ggml-org/cheese.cpp/tree/master/tools/rpc) and [cheese-server](https://github.com/ggml-org/cheese.cpp/tree/master/tools/server) functionality (see https://github.com/ggml-org/cheese.cpp/pull/13061).
* Avoid exposing RPC-style backends and administrative APIs directly to the internet.
* Confirm the hash of any downloaded artifact (e.g. pre-trained model weights) matches a known-good value.
* Encrypt your data if sending it over the network.

Expand Down
Loading