AutoCookies · AutoCookies · Mar 12, 2026 · Mar 12, 2026
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -28,7 +28,7 @@ For more info, please refer to the [AGENTS.md](AGENTS.md) file.
 
 Before submitting your PR:
 - Search for existing PRs to prevent duplicating efforts
-- cheese.cpp uses the ggml tensor library for model evaluation. If you are unfamiliar with ggml, consider taking a look at the [examples in the ggml repository](https://github.com/ggml-org/ggml/tree/master/examples/). [simple](https://github.com/ggml-org/ggml/tree/master/examples/simple) shows the bare minimum for using ggml. [gpt-2](https://github.com/ggml-org/ggml/tree/master/examples/gpt-2) has minimal implementations for language model inference using GPT-2. [mnist](https://github.com/ggml-org/ggml/tree/master/examples/mnist) demonstrates how to train and evaluate a simple image classifier
+- Cheesebrain uses the `ggml` tensor library for model evaluation. If you are unfamiliar with `ggml`, consider taking a look at the [examples in the ggml repository](https://github.com/ggml-org/ggml/tree/master/examples/). [`simple`](https://github.com/ggml-org/ggml/tree/master/examples/simple) shows the bare minimum for using `ggml`. [`gpt-2`](https://github.com/ggml-org/ggml/tree/master/examples/gpt-2) has minimal implementations for language model inference using GPT-2. [`mnist`](https://github.com/ggml-org/ggml/tree/master/examples/mnist) demonstrates how to train and evaluate a simple image classifier
 - Test your changes:
     - Execute [the full CI locally on your machine](ci/README.md) before publishing
     - Verify that the perplexity and the performance are not affected negatively by your changes (use `cheese-perplexity` and `cheese-bench`)
@@ -41,7 +41,7 @@ Before submitting your PR:
 - Consider allowing write access to your branch for faster reviews, as reviewers can push commits directly
 
 After submitting your PR:
-- Expect requests for modifications to ensure the code meets cheese.cpp's standards for quality and long-term maintainability
+- Expect requests for modifications to ensure the code meets Cheesebrain's standards for quality and long-term maintainability
 - Maintainers will rely on your insights and approval when making a final decision to approve and merge a PR
 - If your PR becomes stale, rebase it on top of latest `master` to get maintainers attention
 - Consider adding yourself to [CODEOWNERS](CODEOWNERS) to indicate your availability for fixing related issues and reviewing related PRs
@@ -50,7 +50,7 @@ After submitting your PR:
 
 - Squash-merge PRs
 - Use the following format for the squashed commit title: `<module> : <commit title> (#<issue_number>)`. For example: `utils : fix typo in utils.py (#1234)`
-- Optionally pick a `<module>` from here: https://github.com/ggml-org/cheese.cpp/wiki/Modules
+- Optionally pick a `<module>` that best matches the area you touched (e.g. `ggml`, `tools/server`, `tools/cli`, `tests`, etc.).
 - Let other maintainers merge their own PRs
 - When merging a PR, make sure you have a good understanding of the changes
 - Be mindful of maintenance: most of the work going into a feature happens after the PR is merged. If the PR author is not committed to contribute long-term, someone else needs to take responsibility (you)
@@ -80,7 +80,7 @@ Maintainers reserve the right to decline review or close pull requests for any r
     const enum cheese_rope_type rope_type;
     ```
 
-    _(NOTE: this guideline is yet to be applied to the `cheese.cpp` codebase. New code should follow this guideline.)_
+    _(NOTE: this guideline is yet to be applied consistently to the Cheesebrain codebase. New code should follow this guideline.)_
 
 - Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` (from clang-tools v15+) to format the added code
 - For error handling (when to use return codes vs exceptions, C API failure semantics), see [Error handling](docs/error_handling.md)
@@ -142,7 +142,7 @@ Maintainers reserve the right to decline review or close pull requests for any r
     enum cheese_pooling_type cheese_pooling_type(const cheese_context_t ctx);
     ```
 
-    _(NOTE: this guideline is yet to be applied to the `cheese.cpp` codebase. New code should follow this guideline)_
+    _(NOTE: this guideline is yet to be applied consistently to the Cheesebrain codebase. New code should follow this guideline.)_
 
 - C/C++ filenames are all lowercase with dashes. Headers use the `.h` extension. Source files use the `.c` or `.cpp` extension
 - Python filenames are all lowercase with underscores
@@ -168,7 +168,7 @@ Maintainers reserve the right to decline review or close pull requests for any r
 - When adding or modifying a large piece of code:
   - If you are a collaborator, make sure to add yourself to [CODEOWNERS](CODEOWNERS) to indicate your availability for reviewing related PRs
   - If you are a contributor, find an existing collaborator who is willing to review and maintain your code long-term
-  - Provide the necessary CI workflow (and hardware) to test your changes (see [ci/README.md](https://github.com/ggml-org/cheese.cpp/tree/master/ci))
+  - Provide the necessary CI workflow (and hardware) to test your changes (see [ci/README.md](ci/README.md))
 
 - New code should follow the guidelines (coding, naming, etc.) outlined in this document. Exceptions are allowed in isolated, backend-specific parts of the code that do not interface directly with the `ggml` interfaces.
   _(NOTE: for legacy reasons, existing code is not required to follow this guideline)_
@@ -181,6 +181,4 @@ Maintainers reserve the right to decline review or close pull requests for any r
 
 # Resources
 
-The Github issues, PRs and discussions contain a lot of information that can be useful to get familiar with the codebase. For convenience, some of the more important information is referenced from Github projects:
-
-https://github.com/ggml-org/cheese.cpp/projects
+The GitHub issues, PRs and discussions in this repository contain a lot of information that can be useful to get familiar with the codebase. Browse open and closed issues and pull requests to see common patterns and prior design decisions.
diff --git a/README.md b/README.md
@@ -1,66 +1,78 @@
-# Cheese Brain
+# Cheesebrain
 
-![Cheese Hero](media/cheese0.png)
+![Cheesebrain Hero](media/cheese0.png)
 
 [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
-[![Build Status](https://github.com/ggml-org/cheese.cpp/actions/workflows/build.yml/badge.svg)](https://github.com/ggml-org/cheese.cpp/actions/workflows/build.yml)
-[![Stars](https://img.shields.io/github/stars/ggml-org/cheese.cpp.svg)](https://github.com/ggml-org/cheese.cpp/stargazers)
-[![Forks](https://img.shields.io/github/forks/ggml-org/cheese.cpp.svg)](https://github.com/ggml-org/cheese.cpp/network/members)
-[![Issues](https://img.shields.io/github/issues/ggml-org/cheese.cpp.svg)](https://github.com/ggml-org/cheese.cpp/issues)
-[![Pull Requests](https://img.shields.io/github/issues-pr/ggml-org/cheese.cpp.svg)](https://github.com/ggml-org/cheese.cpp/pulls)
 
-**Cheese Brain** is a state-of-the-art C/C++ library for high-performance Large Language Model (LLM) inference. Designed for efficiency, portability, and ease of use, Cheese enables you to run the latest AI models on anything from a laptop to a high-end server with minimal setup.
+**Cheesebrain** is a high-performance C/C++ runtime for Large Language Model (LLM) inference. It is designed to be small, fast, and self-contained so you can run modern GGUF models on laptops, workstations, and servers with minimal setup.
+
+Cheesebrain is its own project and is not described as “built from cheese.cpp”. It focuses on:
+
+- **Portability** – a single codebase that runs well on Linux, macOS, and Windows.
+- **Performance** – tight low-level code with aggressive quantization support and hardware-aware kernels.
+- **Practical tooling** – CLI, HTTP server, Web UI, quantization tools, and model conversion utilities.
 
 ## Key Features
 
-- **Pure C/C++**: Zero-dependency implementation for maximum portability.
-- **Hardware Optimized**: 
-  - **Apple Silicon**: First-class support via ARM NEON, Accelerate, and Metal.
-  - **x86**: Optimized for AVX, AVX2, AVX512, and AMX.
-  - **NVIDIA**: High-performance CUDA kernels for discrete GPUs.
-- **Efficiency**: Advanced 1.5-bit to 8-bit quantization for reduced memory footprint and lightning-fast execution.
-- **Extensible**: Supports a wide range of state-of-the-art models via GGUF format compatibility.
+- **C / C++ implementation** for easy integration into existing systems.
+- **Hardware-optimized backends**:
+  - Apple Silicon (NEON, Accelerate, Metal)
+  - x86 (SSE/AVX/AVX2/AVX-512/AMX where available)
+  - Optional GPU backends (CUDA / Metal / others, depending on build flags)
+- **Quantization-aware**: supports multiple GGUF quantization schemes to reduce memory and improve throughput.
+- **Rich tooling**:
+  - `cheese-cli` for interactive and scripted use.
+  - `cheese-server` for an OpenAI-compatible HTTP API (with optional Web UI).
+  - Quantization, benchmarking, and conversion helpers under `tools/`.
 
 ## Quick Start
 
-Cheese is designed to be productive from minute one.
+### Build
 
-### Installation
-Build from source for the best performance on your specific hardware:
+From the repository root:
 
 ```bash
 cmake -B build
 cmake --build build --config Release
 ```
 
-### Usage
-Run your first model in seconds:
+This produces binaries under `build/bin/`.
+
+### Run a model
+
+Assuming you have a GGUF model at `./models/model.gguf`:
 
 ```bash
-# Chat with a model
-./build/bin/cheese-cli -m model.gguf -cnv
+# Chat from the terminal
+./build/bin/cheese-cli -m ./models/model.gguf -cnv
 
-# Start an OpenAI-compatible API server (no UI by default; add --webui for the web UI)
-./build/bin/cheese-server -m model.gguf --port 8080
+# Start an OpenAI-compatible HTTP server (add --webui for the Web UI)
+./build/bin/cheese-server -m ./models/model.gguf --port 8080
 ```
 
-## Description
-
-The core mission of **cheese.cpp** is to democratize AI by making high-performance inference accessible on everyday hardware. By leveraging the power of `ggml`, Cheese provides a playground for cutting-edge LLM features, ensuring you always have access to the latest advancements in the field.
+The server exposes `/v1/chat/completions`, `/v1/completions`, `/v1/embeddings`, and related endpoints.
 
 ## Documentation
 
-Explore our comprehensive guides to get the most out of Cheese:
+See the in-repo docs for details:
 
-- [Installation & Build Guide](docs/build.md)
-- [Command Line Interface (CLI)](tools/cli/README.md)
-- [HTTP API Server](tools/server/README.md)
-- [Quantization Guide](tools/quantize/README.md)
+- [Build & configuration](docs/build.md)
+- [CLI usage](tools/cli/README.md)
+- [HTTP server & Web UI](tools/server/README.md)
+- [Quantization](tools/quantize/README.md)
 
-## Performance
+## Performance Tips
 
-To get the best speed: build with a backend that matches your hardware (see [Build](docs/build.md): BLAS for CPU prefill, CUDA/Metal/SYCL for GPU). At runtime, tune threads (`-t`), GPU layers (`-ngl`), and batch size (`-ub`). See [Token generation performance tips](docs/development/token_generation_performance_tips.md) for CPU/GPU tuning and [Server README](tools/server/README.md#performance) for server throughput. To measure speed, see [Benchmarking](docs/development/benchmarking.md) and `scripts/bench-perf.sh`.
+To get the best performance on your machine:
 
----
+- Build with an appropriate backend (BLAS for CPU, CUDA/Metal/SYCL where applicable).
+- Tune runtime flags:
+  - Threads: `-t N`
+  - GPU layers: `-ngl N`
+  - Batch size / ubatch: `-ub N`
+- See:
+  - [Token generation performance tips](docs/development/token_generation_performance_tips.md)
+  - [Server performance notes](tools/server/README.md#performance)
+  - [Benchmarking guide](docs/development/benchmarking.md)
 
-*Cheese.cpp - The most delicious way to run LLMs.*
+Cheesebrain aims to be a pragmatic, low-friction way to run GGUF models locally while remaining small and hackable.
diff --git a/SECURITY.md b/SECURITY.md
@@ -3,7 +3,7 @@
  - [**Reporting a vulnerability**](#reporting-a-vulnerability)
  - [**Requirements**](#requirements)
  - [**Covered Topics**](#covered-topics)
- - [**Using cheese.cpp securely**](#using-cheesecpp-securely)
+ - [**Using Cheesebrain securely**](#using-cheesebrain-securely)
    - [Untrusted models](#untrusted-models)
    - [Untrusted inputs](#untrusted-inputs)
    - [Data privacy](#data-privacy)
@@ -14,7 +14,7 @@
 
 If you have discovered a security vulnerability in this project that falls inside the [covered topics](#covered-topics), please report it privately. **Do not disclose it as a public issue.** This gives us time to work with you to fix the issue before public exposure, reducing the chance that the exploit will be used before a patch is released.
 
-Please disclose it as a private [security advisory](https://github.com/ggml-org/cheese.cpp/security/advisories/new).
+Please disclose it via your repository host's private security advisory mechanism (for example, GitHub Security Advisories).
 
 A team of volunteers on a reasonable-effort basis maintains this project. As such, please give us at least 90 days to work on a fix before public exposure.
 
@@ -44,11 +44,11 @@ Only vulnerabilities that fall within these parts of the project are considered
     - Features not recommended for use in untrusted environments (e.g., router, MCP)
     - Bugs that can lead to Denial-of-Service attack
 
-Note that none of the topics under [Using cheese.cpp securely](#using-cheesecpp-securely) are considered vulnerabilities in Cheese C++.
+Note that none of the topics under [Using Cheesebrain securely](#using-cheesebrain-securely) are considered vulnerabilities in Cheesebrain itself.
 
-For vulnerabilities that fall within the `vendor` directory, please report them directly to the third-party project.
+For vulnerabilities that fall within the `vendor` or `third_party` directories, please report them directly to the corresponding upstream project when possible.
 
-## Using cheese.cpp securely
+## Using Cheesebrain securely
 
 ### Untrusted models
 Be careful when running untrusted models. This classification includes models created by unknown developers or utilizing data obtained from unknown sources.
@@ -66,7 +66,7 @@ For maximum security when handling untrusted inputs, you may need to employ the
 
 * Sandboxing: Isolate the environment where the inference happens.
 * Pre-analysis: Check how the model performs by default when exposed to prompt injection (e.g. using [fuzzing for prompt injection](https://github.com/FonduAI/awesome-prompt-injection?tab=readme-ov-file#tools)). This will give you leads on how hard you will have to work on the next topics.
-* Updates: Keep both Cheese C++ and your libraries updated with the latest security patches.
+* Updates: Keep both Cheesebrain and your libraries updated with the latest security patches.
 * Input Sanitation: Before feeding data to the model, sanitize inputs rigorously. This involves techniques such as:
     * Validation: Enforce strict rules on allowed characters and data types.
     * Filtering: Remove potentially malicious scripts or code fragments.
@@ -80,7 +80,7 @@ To protect sensitive data from potential leaks or unauthorized access, it is cru
 ### Untrusted environments or networks
 
 If you can't run your models in a secure and isolated environment or if it must be exposed to an untrusted network, make sure to take the following security precautions:
-* Do not use the RPC backend, [rpc-server](https://github.com/ggml-org/cheese.cpp/tree/master/tools/rpc) and [cheese-server](https://github.com/ggml-org/cheese.cpp/tree/master/tools/server) functionality (see https://github.com/ggml-org/cheese.cpp/pull/13061).
+* Avoid exposing RPC-style backends and administrative APIs directly to the internet.
 * Confirm the hash of any downloaded artifact (e.g. pre-trained model weights) matches a known-good value.
 * Encrypt your data if sending it over the network.