Skip to content

sacredvoid/onyx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Onyx

onyx

Run Google's Gemma 4 entirely in your browser.

No API keys. No server. No data leaving your machine.

Deploy GitHub Pages License: MIT TypeScript React WebGPU

Live Demo | Playground | Arena | Conversion Toolkit


What is this?

Onyx is a demo website and Python toolkit for running Google's Gemma 4 models directly in your browser using WebGPU. Everything runs locally on your device.

Features

  • Multimodal Chat - text, images, and audio, all processed in-browser
  • E2B vs E4B Arena - same prompt, two models, side-by-side speed and quality comparison
  • Conversion Toolkit - Python scripts to convert, validate, and benchmark Gemma 4 ONNX models

How it works

The demo site uses Transformers.js with WebGPU acceleration to run Gemma 4 E2B (2.3B params, ~3.2 GB total with all encoders) and E4B (~5 GB) in a Web Worker. Models are quantized to 4-bit (q4f16) ONNX format and cached locally after first download.

Demo Site

Requirements

  • Chrome 113+ or Edge 113+ with WebGPU enabled
  • 4 GB GPU memory for E2B, 8 GB for E4B

Run locally

cd web
npm install
npm run dev

Open http://localhost:5173.

Pages

  • / - Landing page with WebGPU compatibility check
  • /playground - Multimodal chat with model selection (E2B / E4B)
  • /arena - Side-by-side sequential race comparing E2B vs E4B

Conversion Toolkit

Python scripts for converting Gemma 4 models to browser-ready ONNX format.

Setup

cd toolkit
pip install -r requirements.txt

Convert

python convert.py --model google/gemma-4-E2B-it --output output/e2b --quant q4

Options: --quant fp16, --quant q8, --quant q4

Validate

python validate.py --converted output/e2b/onnx_q4 --quick

Or compare against the original:

python validate.py --original google/gemma-4-E2B-it --converted output/e2b/onnx_q4

Benchmark

python benchmark.py --model google/gemma-4-E2B-it --quant-levels fp16 q8 q4

Tech Stack

Component Technology
Frontend React 19, TypeScript, Vite, Tailwind CSS 4
ML Inference Transformers.js, WebGPU, ONNX
Conversion optimum-onnx, transformers, onnxruntime

Models

Model Params Size (q4f16) Speed (M3 Pro)
E2B 2.3B effective ~3.2 GB ~5-20 tok/s
E4B 4B effective ~5 GB ~3-15 tok/s

Models from onnx-community/gemma-4-E2B-it-ONNX and onnx-community/gemma-4-E4B-it-ONNX.

License

MIT

About

Run Google's Gemma 4 entirely in your browser via WebGPU. Multimodal chat, E2B vs E4B arena, ONNX conversion toolkit.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors