From feeb31bdb0e96e15757dc0114fa6e3590a41105d Mon Sep 17 00:00:00 2001 From: AutoCookie Date: Thu, 12 Mar 2026 17:01:37 +0700 Subject: [PATCH] Update --- CONTRIBUTING.md | 16 +++++----- README.md | 84 ++++++++++++++++++++++++++++--------------------- SECURITY.md | 14 ++++----- 3 files changed, 62 insertions(+), 52 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index ecfe6d4..2b28b72 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -28,7 +28,7 @@ For more info, please refer to the [AGENTS.md](AGENTS.md) file. Before submitting your PR: - Search for existing PRs to prevent duplicating efforts -- cheese.cpp uses the ggml tensor library for model evaluation. If you are unfamiliar with ggml, consider taking a look at the [examples in the ggml repository](https://github.com/ggml-org/ggml/tree/master/examples/). [simple](https://github.com/ggml-org/ggml/tree/master/examples/simple) shows the bare minimum for using ggml. [gpt-2](https://github.com/ggml-org/ggml/tree/master/examples/gpt-2) has minimal implementations for language model inference using GPT-2. [mnist](https://github.com/ggml-org/ggml/tree/master/examples/mnist) demonstrates how to train and evaluate a simple image classifier +- Cheesebrain uses the `ggml` tensor library for model evaluation. If you are unfamiliar with `ggml`, consider taking a look at the [examples in the ggml repository](https://github.com/ggml-org/ggml/tree/master/examples/). [`simple`](https://github.com/ggml-org/ggml/tree/master/examples/simple) shows the bare minimum for using `ggml`. [`gpt-2`](https://github.com/ggml-org/ggml/tree/master/examples/gpt-2) has minimal implementations for language model inference using GPT-2. [`mnist`](https://github.com/ggml-org/ggml/tree/master/examples/mnist) demonstrates how to train and evaluate a simple image classifier - Test your changes: - Execute [the full CI locally on your machine](ci/README.md) before publishing - Verify that the perplexity and the performance are not affected negatively by your changes (use `cheese-perplexity` and `cheese-bench`) @@ -41,7 +41,7 @@ Before submitting your PR: - Consider allowing write access to your branch for faster reviews, as reviewers can push commits directly After submitting your PR: -- Expect requests for modifications to ensure the code meets cheese.cpp's standards for quality and long-term maintainability +- Expect requests for modifications to ensure the code meets Cheesebrain's standards for quality and long-term maintainability - Maintainers will rely on your insights and approval when making a final decision to approve and merge a PR - If your PR becomes stale, rebase it on top of latest `master` to get maintainers attention - Consider adding yourself to [CODEOWNERS](CODEOWNERS) to indicate your availability for fixing related issues and reviewing related PRs @@ -50,7 +50,7 @@ After submitting your PR: - Squash-merge PRs - Use the following format for the squashed commit title: ` : (#)`. For example: `utils : fix typo in utils.py (#1234)` -- Optionally pick a `` from here: https://github.com/ggml-org/cheese.cpp/wiki/Modules +- Optionally pick a `` that best matches the area you touched (e.g. `ggml`, `tools/server`, `tools/cli`, `tests`, etc.). - Let other maintainers merge their own PRs - When merging a PR, make sure you have a good understanding of the changes - Be mindful of maintenance: most of the work going into a feature happens after the PR is merged. If the PR author is not committed to contribute long-term, someone else needs to take responsibility (you) @@ -80,7 +80,7 @@ Maintainers reserve the right to decline review or close pull requests for any r const enum cheese_rope_type rope_type; ``` - _(NOTE: this guideline is yet to be applied to the `cheese.cpp` codebase. New code should follow this guideline.)_ + _(NOTE: this guideline is yet to be applied consistently to the Cheesebrain codebase. New code should follow this guideline.)_ - Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` (from clang-tools v15+) to format the added code - For error handling (when to use return codes vs exceptions, C API failure semantics), see [Error handling](docs/error_handling.md) @@ -142,7 +142,7 @@ Maintainers reserve the right to decline review or close pull requests for any r enum cheese_pooling_type cheese_pooling_type(const cheese_context_t ctx); ``` - _(NOTE: this guideline is yet to be applied to the `cheese.cpp` codebase. New code should follow this guideline)_ + _(NOTE: this guideline is yet to be applied consistently to the Cheesebrain codebase. New code should follow this guideline.)_ - C/C++ filenames are all lowercase with dashes. Headers use the `.h` extension. Source files use the `.c` or `.cpp` extension - Python filenames are all lowercase with underscores @@ -168,7 +168,7 @@ Maintainers reserve the right to decline review or close pull requests for any r - When adding or modifying a large piece of code: - If you are a collaborator, make sure to add yourself to [CODEOWNERS](CODEOWNERS) to indicate your availability for reviewing related PRs - If you are a contributor, find an existing collaborator who is willing to review and maintain your code long-term - - Provide the necessary CI workflow (and hardware) to test your changes (see [ci/README.md](https://github.com/ggml-org/cheese.cpp/tree/master/ci)) + - Provide the necessary CI workflow (and hardware) to test your changes (see [ci/README.md](ci/README.md)) - New code should follow the guidelines (coding, naming, etc.) outlined in this document. Exceptions are allowed in isolated, backend-specific parts of the code that do not interface directly with the `ggml` interfaces. _(NOTE: for legacy reasons, existing code is not required to follow this guideline)_ @@ -181,6 +181,4 @@ Maintainers reserve the right to decline review or close pull requests for any r # Resources -The Github issues, PRs and discussions contain a lot of information that can be useful to get familiar with the codebase. For convenience, some of the more important information is referenced from Github projects: - -https://github.com/ggml-org/cheese.cpp/projects +The GitHub issues, PRs and discussions in this repository contain a lot of information that can be useful to get familiar with the codebase. Browse open and closed issues and pull requests to see common patterns and prior design decisions. diff --git a/README.md b/README.md index 8def537..3f4f2ea 100644 --- a/README.md +++ b/README.md @@ -1,66 +1,78 @@ -# Cheese Brain +# Cheesebrain -![Cheese Hero](media/cheese0.png) +![Cheesebrain Hero](media/cheese0.png) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT) -[![Build Status](https://github.com/ggml-org/cheese.cpp/actions/workflows/build.yml/badge.svg)](https://github.com/ggml-org/cheese.cpp/actions/workflows/build.yml) -[![Stars](https://img.shields.io/github/stars/ggml-org/cheese.cpp.svg)](https://github.com/ggml-org/cheese.cpp/stargazers) -[![Forks](https://img.shields.io/github/forks/ggml-org/cheese.cpp.svg)](https://github.com/ggml-org/cheese.cpp/network/members) -[![Issues](https://img.shields.io/github/issues/ggml-org/cheese.cpp.svg)](https://github.com/ggml-org/cheese.cpp/issues) -[![Pull Requests](https://img.shields.io/github/issues-pr/ggml-org/cheese.cpp.svg)](https://github.com/ggml-org/cheese.cpp/pulls) -**Cheese Brain** is a state-of-the-art C/C++ library for high-performance Large Language Model (LLM) inference. Designed for efficiency, portability, and ease of use, Cheese enables you to run the latest AI models on anything from a laptop to a high-end server with minimal setup. +**Cheesebrain** is a high-performance C/C++ runtime for Large Language Model (LLM) inference. It is designed to be small, fast, and self-contained so you can run modern GGUF models on laptops, workstations, and servers with minimal setup. + +Cheesebrain is its own project and is not described as “built from cheese.cpp”. It focuses on: + +- **Portability** – a single codebase that runs well on Linux, macOS, and Windows. +- **Performance** – tight low-level code with aggressive quantization support and hardware-aware kernels. +- **Practical tooling** – CLI, HTTP server, Web UI, quantization tools, and model conversion utilities. ## Key Features -- **Pure C/C++**: Zero-dependency implementation for maximum portability. -- **Hardware Optimized**: - - **Apple Silicon**: First-class support via ARM NEON, Accelerate, and Metal. - - **x86**: Optimized for AVX, AVX2, AVX512, and AMX. - - **NVIDIA**: High-performance CUDA kernels for discrete GPUs. -- **Efficiency**: Advanced 1.5-bit to 8-bit quantization for reduced memory footprint and lightning-fast execution. -- **Extensible**: Supports a wide range of state-of-the-art models via GGUF format compatibility. +- **C / C++ implementation** for easy integration into existing systems. +- **Hardware-optimized backends**: + - Apple Silicon (NEON, Accelerate, Metal) + - x86 (SSE/AVX/AVX2/AVX-512/AMX where available) + - Optional GPU backends (CUDA / Metal / others, depending on build flags) +- **Quantization-aware**: supports multiple GGUF quantization schemes to reduce memory and improve throughput. +- **Rich tooling**: + - `cheese-cli` for interactive and scripted use. + - `cheese-server` for an OpenAI-compatible HTTP API (with optional Web UI). + - Quantization, benchmarking, and conversion helpers under `tools/`. ## Quick Start -Cheese is designed to be productive from minute one. +### Build -### Installation -Build from source for the best performance on your specific hardware: +From the repository root: ```bash cmake -B build cmake --build build --config Release ``` -### Usage -Run your first model in seconds: +This produces binaries under `build/bin/`. + +### Run a model + +Assuming you have a GGUF model at `./models/model.gguf`: ```bash -# Chat with a model -./build/bin/cheese-cli -m model.gguf -cnv +# Chat from the terminal +./build/bin/cheese-cli -m ./models/model.gguf -cnv -# Start an OpenAI-compatible API server (no UI by default; add --webui for the web UI) -./build/bin/cheese-server -m model.gguf --port 8080 +# Start an OpenAI-compatible HTTP server (add --webui for the Web UI) +./build/bin/cheese-server -m ./models/model.gguf --port 8080 ``` -## Description - -The core mission of **cheese.cpp** is to democratize AI by making high-performance inference accessible on everyday hardware. By leveraging the power of `ggml`, Cheese provides a playground for cutting-edge LLM features, ensuring you always have access to the latest advancements in the field. +The server exposes `/v1/chat/completions`, `/v1/completions`, `/v1/embeddings`, and related endpoints. ## Documentation -Explore our comprehensive guides to get the most out of Cheese: +See the in-repo docs for details: -- [Installation & Build Guide](docs/build.md) -- [Command Line Interface (CLI)](tools/cli/README.md) -- [HTTP API Server](tools/server/README.md) -- [Quantization Guide](tools/quantize/README.md) +- [Build & configuration](docs/build.md) +- [CLI usage](tools/cli/README.md) +- [HTTP server & Web UI](tools/server/README.md) +- [Quantization](tools/quantize/README.md) -## Performance +## Performance Tips -To get the best speed: build with a backend that matches your hardware (see [Build](docs/build.md): BLAS for CPU prefill, CUDA/Metal/SYCL for GPU). At runtime, tune threads (`-t`), GPU layers (`-ngl`), and batch size (`-ub`). See [Token generation performance tips](docs/development/token_generation_performance_tips.md) for CPU/GPU tuning and [Server README](tools/server/README.md#performance) for server throughput. To measure speed, see [Benchmarking](docs/development/benchmarking.md) and `scripts/bench-perf.sh`. +To get the best performance on your machine: ---- +- Build with an appropriate backend (BLAS for CPU, CUDA/Metal/SYCL where applicable). +- Tune runtime flags: + - Threads: `-t N` + - GPU layers: `-ngl N` + - Batch size / ubatch: `-ub N` +- See: + - [Token generation performance tips](docs/development/token_generation_performance_tips.md) + - [Server performance notes](tools/server/README.md#performance) + - [Benchmarking guide](docs/development/benchmarking.md) -*Cheese.cpp - The most delicious way to run LLMs.* +Cheesebrain aims to be a pragmatic, low-friction way to run GGUF models locally while remaining small and hackable. diff --git a/SECURITY.md b/SECURITY.md index e7608b9..d43d592 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -3,7 +3,7 @@ - [**Reporting a vulnerability**](#reporting-a-vulnerability) - [**Requirements**](#requirements) - [**Covered Topics**](#covered-topics) - - [**Using cheese.cpp securely**](#using-cheesecpp-securely) + - [**Using Cheesebrain securely**](#using-cheesebrain-securely) - [Untrusted models](#untrusted-models) - [Untrusted inputs](#untrusted-inputs) - [Data privacy](#data-privacy) @@ -14,7 +14,7 @@ If you have discovered a security vulnerability in this project that falls inside the [covered topics](#covered-topics), please report it privately. **Do not disclose it as a public issue.** This gives us time to work with you to fix the issue before public exposure, reducing the chance that the exploit will be used before a patch is released. -Please disclose it as a private [security advisory](https://github.com/ggml-org/cheese.cpp/security/advisories/new). +Please disclose it via your repository host's private security advisory mechanism (for example, GitHub Security Advisories). A team of volunteers on a reasonable-effort basis maintains this project. As such, please give us at least 90 days to work on a fix before public exposure. @@ -44,11 +44,11 @@ Only vulnerabilities that fall within these parts of the project are considered - Features not recommended for use in untrusted environments (e.g., router, MCP) - Bugs that can lead to Denial-of-Service attack -Note that none of the topics under [Using cheese.cpp securely](#using-cheesecpp-securely) are considered vulnerabilities in Cheese C++. +Note that none of the topics under [Using Cheesebrain securely](#using-cheesebrain-securely) are considered vulnerabilities in Cheesebrain itself. -For vulnerabilities that fall within the `vendor` directory, please report them directly to the third-party project. +For vulnerabilities that fall within the `vendor` or `third_party` directories, please report them directly to the corresponding upstream project when possible. -## Using cheese.cpp securely +## Using Cheesebrain securely ### Untrusted models Be careful when running untrusted models. This classification includes models created by unknown developers or utilizing data obtained from unknown sources. @@ -66,7 +66,7 @@ For maximum security when handling untrusted inputs, you may need to employ the * Sandboxing: Isolate the environment where the inference happens. * Pre-analysis: Check how the model performs by default when exposed to prompt injection (e.g. using [fuzzing for prompt injection](https://github.com/FonduAI/awesome-prompt-injection?tab=readme-ov-file#tools)). This will give you leads on how hard you will have to work on the next topics. -* Updates: Keep both Cheese C++ and your libraries updated with the latest security patches. +* Updates: Keep both Cheesebrain and your libraries updated with the latest security patches. * Input Sanitation: Before feeding data to the model, sanitize inputs rigorously. This involves techniques such as: * Validation: Enforce strict rules on allowed characters and data types. * Filtering: Remove potentially malicious scripts or code fragments. @@ -80,7 +80,7 @@ To protect sensitive data from potential leaks or unauthorized access, it is cru ### Untrusted environments or networks If you can't run your models in a secure and isolated environment or if it must be exposed to an untrusted network, make sure to take the following security precautions: -* Do not use the RPC backend, [rpc-server](https://github.com/ggml-org/cheese.cpp/tree/master/tools/rpc) and [cheese-server](https://github.com/ggml-org/cheese.cpp/tree/master/tools/server) functionality (see https://github.com/ggml-org/cheese.cpp/pull/13061). +* Avoid exposing RPC-style backends and administrative APIs directly to the internet. * Confirm the hash of any downloaded artifact (e.g. pre-trained model weights) matches a known-good value. * Encrypt your data if sending it over the network.