Hermes Studio

Control Hermes Agent from your desktop, voice, and phone

Hermes Studio is the missing setup and control plane for Hermes Agent. It turns Hermes' CLI-first power into guided flows for computer-use, model setup, tool presets, and persistent phone access.

Why

Hermes can already browse, use tools, remember context, schedule work, and connect to messaging platforms. The hard part is getting all of that configured without living in terminal docs.

Hermes Studio focuses on the high-value path:

Install or verify Hermes Agent.
Configure your model provider.
Enable the Computer Use preset.
Connect Telegram so your agent is reachable from your phone.
Chat or speak to Hermes while it works through native apps, browser, terminal, files, memory, and scheduled tasks.

Current Features

Guided Hermes install and provider setup.
Web chat with streamed agent responses.
Computer Use preset for native app automation, browser, terminal, vision, files, memory, TTS, web, and cron tools.
Generic native macOS computer-use bridge for screen observation, app launching, clicks, keys, and text entry.
Optional live Chrome connection for explicit website control through Hermes browser tools.
macOS permission checklist for local computer-use workflows.
Native macOS shell scaffold with Accessibility status checks and deep links to Privacy & Security panes.
Telegram configuration with token and allowed-user storage in ~/.hermes/.env.
Gateway start/stop controls with live logs.
Raw tool manager for enabling and disabling Hermes toolsets.
Local voice input for computer-use commands through on-device STT engines.

Quick Start

Prerequisites:

macOS, Linux, or WSL2
Node.js 20+
Python 3.11+
Hermes Agent, or let Hermes Studio install it

git clone https://github.com/YOUR_USERNAME/hermes-studio.git
cd hermes-studio
make install
make dev

Open http://localhost:5173.

The backend runs on 127.0.0.1:8420; Vite proxies API and websocket traffic during development.

Local Voice

Hermes Studio records microphone audio in the app and sends it to the local backend for transcription. No hosted speech API is required when one of these local engines is installed:

# Good default on any Mac
python3 -m pip install faster-whisper

# Apple Silicon optimized option
python3 -m pip install mlx-whisper

Advanced users can also use whisper.cpp:

brew install whisper-cpp
export WHISPER_CPP_MODEL=/path/to/ggml-base.en.bin

Optional environment overrides:

HERMES_STUDIO_STT_ENGINE=mlx-whisper|faster-whisper|whisper.cpp|auto
FASTER_WHISPER_MODEL=base.en
MLX_WHISPER_MODEL=mlx-community/whisper-base.en-mlx

Open Computer Use to see whether a local voice engine is ready. Use the mic button in Chat to record a command, transcribe it locally, and send it to Hermes.

In the macOS desktop app, hold Option+Command to record a voice command from anywhere in Hermes Studio. Release the keys to stop, transcribe locally, send the command to Hermes, and hear the response.

Talk-back uses the first available TTS engine:

# Apple Silicon local TTS
python3 -m pip install mlx-audio-plus

# Hosted fallback
export ELEVENLABS_API_KEY=...

Optional TTS overrides:

HERMES_STUDIO_TTS_PROVIDER=mlx-audio|elevenlabs|macos-say|auto
MLX_AUDIO_TTS_MODEL=mlx-community/Kokoro-82M-bf16
ELEVENLABS_VOICE_ID=JBFqnCBsd6RMkjVDRZzb

Computer Use

Computer Use keeps native app requests native. Hermes Studio exposes a general macOS computer-use bridge that lets Hermes observe the screen, open apps, click, press keys, paste generated text, and repeat until the requested state is visible. Commands like this should work without opening a browser:

Open Notes and write a short poem.

macOS may ask for Automation or Accessibility permission the first time Hermes Studio controls an app.

For explicit website tasks, Studio can also connect Hermes browser tools to a visible Chrome session over the Chrome DevTools Protocol (CDP). That lets Hermes operate real websites through a persistent local profile:

Gmail: open, compose, fill recipient/body, then ask before sending.
X: open, draft a post, then ask before posting.
WhatsApp Web: open, search a chat, draft a message, then ask before sending.

Open Computer Use and click Connect Chrome only when you want website control. Studio launches a separate persistent Chrome profile at ~/.hermes/studio-chrome-profile with CDP on 127.0.0.1:9222, then stores BROWSER_CDP_URL in ~/.hermes/.env so Hermes browser tools use that visible session.

For authenticated sites, log in once inside that Chrome profile. Hermes should stop and ask when login, permissions, or human confirmation is needed.

Native macOS App

Hermes Studio now includes a Tauri shell. It loads the existing React app while adding native macOS capabilities that a browser cannot provide.

npm install
cd frontend && npm install
cd ..
npm run desktop:dev

The native shell currently adds:

Accessibility permission status checks.
Shortcuts to Accessibility, Screen Recording, Microphone, and Automation settings.
A path toward Keychain secrets, LaunchAgent persistence, menu bar controls, and signed macOS releases.

macOS Release Signing

Public macOS downloads must be signed and notarized. An unsigned web-downloaded .dmg can be blocked by Gatekeeper with a message that the app is damaged.

The Release macOS DMG workflow expects these GitHub Actions secrets before it will publish a release:

APPLE_CERTIFICATE: base64-encoded Developer ID Application .p12 certificate.
APPLE_CERTIFICATE_PASSWORD: password for the .p12 certificate.
APPLE_ID: Apple ID used for notarization.
APPLE_PASSWORD: app-specific password for that Apple ID.
APPLE_TEAM_ID: Apple Developer Team ID.
APPLE_SIGNING_IDENTITY: optional explicit Developer ID Application identity.

Create a release by pushing a version tag:

git tag v0.1.1
git push origin v0.1.1

The workflow builds the DMG, verifies the code signature, validates the stapled notarization ticket, runs Gatekeeper assessment, and uploads Hermes-Studio-macOS.dmg plus its SHA-256 file to the GitHub release.

Rust is pinned in rust-toolchain.toml because current Tauri dependencies require Rust 1.88+.

Apple Intelligence And Local Models

Apple's Foundation Models framework exposes the on-device language model behind Apple Intelligence to apps on supported systems. That can help Hermes Studio with lightweight local tasks such as intent routing, structured command extraction, safety classification, and deciding whether a request should go to Hermes or a cloud/local LLM.

It is not a full replacement for Hermes Agent:

It is Swift-native, so it should live behind Tauri native commands or a small helper.
It depends on Apple Intelligence availability and user settings.
Its context window and model control are limited compared with dedicated agent backends.
It cannot silently bypass macOS privacy permissions.

The practical path is to support multiple local intelligence layers:

Apple Foundation Models for native on-device planning when available.
Ollama or llama.cpp for heavier local models.
Hermes Agent as the main tool-using runtime.

First Run

Open Setup and configure Hermes.
Open Computer Use and click Enable Preset.
Grant macOS permissions when prompted by your terminal or browser automation stack.
Open Connections, save your Telegram bot token and allowed user ID, then start the gateway.
Message your bot on Telegram or use the web chat.

Telegram Setup

Message @BotFather.
Create a bot with /newbot.
Paste the bot token into Hermes Studio.
Message @userinfobot to get your numeric Telegram user ID.
Add that ID to Allowed Telegram user IDs.
Start the gateway.

Hermes Agent supports Telegram text, voice memos, images, file attachments, and scheduled task delivery. Hermes Studio stores the required token and allowlist in your local Hermes environment file.

WhatsApp Status

Hermes Agent supports WhatsApp through a Baileys-based bridge and QR pairing flow. Hermes Studio does not expose the WhatsApp pairing UI yet. Telegram is the first supported phone connection because it is safer, simpler, and uses official bot tokens.

Architecture

React/Vite frontend
        |
        | REST + WebSocket
        v
FastAPI backend
        |
        | service wrappers
        v
Hermes Agent CLI/config/gateway
        |
        v
LLM providers, tools, Telegram, local machine

Frontend:

React 19
Vite
Tailwind CSS
Zustand
Framer Motion

Backend:

FastAPI
WebSockets
Hermes CLI subprocess integration
Local ~/.hermes config and environment management

Project Structure

frontend/
  src/pages/              app screens
  src/components/         layout, chat, and UI components
  src/hooks/              websocket and voice hooks
  src/stores/             chat state
  src/lib/                API client and shared types

backend/
  app/routers/            REST and websocket routes
  app/services/           Hermes command, config, tools, gateway services
  app/models/             Pydantic schemas

Development

make dev           # frontend + backend
make dev-frontend  # Vite only
make dev-backend   # FastAPI only
make build         # production frontend into backend/static
make clean         # remove build artifacts

Docker

docker compose up

The container mounts ~/.hermes so Hermes configuration and platform sessions persist.

Roadmap

Security Notes

API keys and bot tokens stay local in Hermes config files.
Gateway logs are redacted before being returned to the browser.
Telegram access should always be restricted with TELEGRAM_ALLOWED_USERS.
WhatsApp sessions, once supported in the UI, must be treated like credentials.
Computer-use features can operate your machine. Keep approval mode on unless you explicitly want unattended execution.

Acknowledgments

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
frontend		frontend
scripts		scripts
src-tauri		src-tauri
.gitignore		.gitignore
.nojekyll		.nojekyll
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hermes Studio

Control Hermes Agent from your desktop, voice, and phone

Why

Current Features

Quick Start

Local Voice

Computer Use

Native macOS App

macOS Release Signing

Apple Intelligence And Local Models

First Run

Telegram Setup

WhatsApp Status

Architecture

Project Structure

Development

Docker

Roadmap

Security Notes

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hermes Studio

Control Hermes Agent from your desktop, voice, and phone

Why

Current Features

Quick Start

Local Voice

Computer Use

Native macOS App

macOS Release Signing

Apple Intelligence And Local Models

First Run

Telegram Setup

WhatsApp Status

Architecture

Project Structure

Development

Docker

Roadmap

Security Notes

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages