Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ share/python-wheels/
.installed.cfg
*.egg
MANIFEST
CLAUDE.md
110 changes: 96 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,91 @@
# speech_compass
# SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional Guidance via Multi-Microphone Localization

This repository contains the code accompanying the publication
**SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional
Guidance via Multi-Microphone Localization**,
published in CHI, 2025. (https://arxiv.org/abs/2502.08848)
[![CHI 2025 Best Paper](https://img.shields.io/badge/CHI%202025-Best%20Paper%20Award-gold)](https://dl.acm.org/doi/10.1145/3706598.3713631)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![arXiv](https://img.shields.io/badge/arXiv-2502.08848-b31b1b.svg)](https://arxiv.org/abs/2502.08848)

## Installation
[Paper (PDF)](https://arxiv.org/pdf/2502.08848) | [ACM Digital Library](https://dl.acm.org/doi/10.1145/3706598.3713631) | [Project Page](https://www.olwal.com/speechcompass) | [Google Research Blog](https://research.google/blog/making-group-conversations-more-accessible-with-sound-localization/)

Setting up the whole system requires multiple steps and custom hardware. Refer
to details in doc folder for each step:
Artem Dementyev*, Dimitri Kanevsky, Samuel J. Yang, Mathieu Parvaix, Chiong Lai, Alex Olwal*

1) Custom hardware. The microphone phone-case was custom designed.
Official code release for **SpeechCompass: Enhancing Mobile Captioning with Diarization and
Directional Guidance via Multi-Microphone Localization**, published at CHI 2025.

2) Firmware. The phone-case microcontroller needs to be flashed with firmware
[Video 4:24](https://www.youtube.com/watch?v=crWXO5T5jaQ) | [Presentation 9:30](https://www.youtube.com/watch?v=cOnMxClQZ4g)

2) DSP algorithms. The core processing algorithms were developed in light-weight
C to be platform agnostic. They can be tested separately.
![SpeechCompass teaser](docs/images/speech_compass_teaser.jpg)

3) Android application. The app was developed in Android studio
<small>*First and last author contributed equally to this work</small>

## Overview

Mobile speech-to-text apps have a fundamental limitation in group conversations: they
transcribe everything into a single undifferentiated stream, making it hard to follow who
said what. SpeechCompass addresses this by adding a spatial dimension — using multiple
microphones to localize speakers in real time and overlay directional guidance on live
captions.

The system is designed with accessibility in mind, particularly for people who are hard of
hearing. Rather than relying on machine learning approaches that require video, speaker
embeddings, or high compute, SpeechCompass uses classical DSP (GCC-PHAT + kernel density
estimation) that runs on a low-power embedded microcontroller with low latency and no voice
data retention.

![App](docs/images/app.jpg)

### Visualizations

The Android app offers multiple ways to display speaker direction alongside captions:

- **Colored text** — each speaker gets a distinct color
- **Directional arrows and glyphs** — indicate where speech is coming from
- **Radar minimap** — a persistent spatial overview of active speakers
- **Edge indicators** — subtle screen-edge cues for peripheral awareness
- **Speech suppression** — filter out speech from a specific direction

### Performance

- **Localization accuracy:** 11°–22° average error at normal conversational volume (60–65 dB),
comparable to human localization ability
- **Diarization:** 4-microphone configuration achieves 23–35% relative improvement in
Diarization Error Rate (DER) over a 3-microphone setup across varying SNR conditions

### User Research

A survey of 263 frequent captioning users identified speaker distinction as the most
significant unmet need. In a follow-up prototype study with 8 frequent users, colored text
and directional arrows were the preferred visualizations, and all participants agreed that
directional guidance was valuable for group conversations.

## System

![System diagram](docs/images/system_diagram.png)

SpeechCompass combines a custom hardware phone case with lightweight on-device processing:

- A **4-microphone phone case** sends audio to an STM32 L5 microcontroller, which runs
GCC-PHAT localization and streams azimuth angles to the phone over USB
- The **Android app** uses the phone's built-in microphone for speech recognition (ASR)
and receives speaker direction from the case — keeping voice data local and processing
costs low
- The **DSP algorithms** are written in portable C11 and can also run on phones with
2+ built-in microphones, providing 180° localization without additional hardware

## Repository Structure

| Component | Description |
|-----------|-------------|
| [`hardware/README.md`](hardware/README.md) | PCB schematics for the custom 4-microphone phone case |
| [`firmware/README.md`](firmware/README.md) | STM32 L5 firmware (GCC-PHAT localization → USB output) |
| [`dsp/README.md`](dsp/README.md) | Platform-agnostic C localization and beamforming algorithms, with Bazel unit tests |
| [`android/README.md`](android/README.md) | Android Studio app (ASR + directional visualization) |

Each component can be used independently — in particular, the DSP algorithms can be built
and tested with Bazel without any hardware.

## Citing this work

```
```bibtex
@inproceedings{dementyev2025speechcompass,
title={SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional Guidance via Multi-Microphone Localization},
author={Dementyev, Artem and Kanevsky, Dimitri and Yang, Samuel and Parvaix, Mathieu and Lai, Chiong and Olwal, Alex},
Expand All @@ -30,6 +94,24 @@ C to be platform agnostic. They can be tested separately.
}
```

## Related Work

SpeechCompass builds on **LiveLocalizer** (UIST 2023), which first demonstrated
microphone-array localization augmenting mobile speech-to-text. The same hardware can run
the SpeechCompass firmware.

> Dementyev, A., Kanevsky, D., Yang, S., Parvaix, M., Lai, C., and Olwal, A.
> "LiveLocalizer: Augmenting Mobile Speech-to-Text with Microphone Arrays, Optimized
> Localization and Beamforming." *UIST 2023 Adjunct*, San Francisco, CA.
> [ACM DL](https://dl.acm.org/doi/10.1145/3586182.3615789)

## Acknowledgments

We thank Sagar Savla, Dmitrii Votintcev, Pascal Getreuer, Richard Lyon, Alex Huang, Shao-Fu Shih,
Chet Gnegy, Shaun Kane, James Landay, Malcolm Slaney, Meredith Morris, Carson Lau,
Ngan Nguyen, Mei Lu, Don Barnett, Ryan Geraghty, and Sanjay Batra for their contributions
and support.

## License and disclaimer

Copyright 2025 Google LLC
Expand Down
34 changes: 34 additions & 0 deletions android/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Android Application

The app runs on the phone. It uses the phone's built-in microphone for speech recognition
(ASR) and receives azimuth angle data from the SpeechCompass phone case over USB.
Visualizations are built with the [Processing for Android](https://android.processing.org/)
framework.

![App screenshot](https://github.com/google-deepmind/speech_compass/blob/main/docs/images/app.jpg)

## Quickest way: install the pre-built APK

If you don't need to modify the app, sideload the pre-built APK via ADB:

```
adb install path/to/speechcompass.apk
```

The APK is available on
[Google Drive](https://drive.google.com/file/d/15mf4d6tlzD6GbkNFa18XGd1UUCz8RhcP/view?usp=drive_link&resourcekey=0-Whdp8aFD-M6qDvHfQQJZww).
Connect the phone to your PC before running the command.

> The app may stop working on newer Android versions due to API changes.

## Building from source

1. Install the latest [Android Studio](https://developer.android.com/studio).

2. Download the [zipped Android Studio project](TODO) and unzip it.

3. Open Android Studio and import the project (**File → Open**).

4. Build the project (**Build → Make Project**).

5. Connect the phone over USB and click **Run** to install and launch.
28 changes: 0 additions & 28 deletions docs/algorithms/index.md

This file was deleted.

35 changes: 0 additions & 35 deletions docs/android/index.md

This file was deleted.

51 changes: 0 additions & 51 deletions docs/firmware/index.md

This file was deleted.

35 changes: 0 additions & 35 deletions docs/hardware/index.md

This file was deleted.

41 changes: 0 additions & 41 deletions docs/index.md

This file was deleted.

Loading