From f47bddcfd01e4fcbee88f6908af7562667561a47 Mon Sep 17 00:00:00 2001 From: Alex Olwal <517681+olwal@users.noreply.github.com> Date: Wed, 18 Mar 2026 20:18:48 -0700 Subject: [PATCH 1/4] Restructure repo as research paper repository - Rewrite README.md as a landing page with paper/blog/project links, badges, teaser image, component table, and BibTeX - Add per-component README files: hardware/, firmware/, android/, dsp/ - Move PCB schematics from docs/hardware/ into hardware/ - Remove superseded docs/ index files (content now lives in component READMEs) - Add CLAUDE.md with build instructions and architecture notes --- .gitignore | 1 + README.md | 39 ++++++++----- android/README.md | 34 +++++++++++ docs/algorithms/index.md | 28 --------- docs/android/index.md | 35 ------------ docs/firmware/index.md | 51 ----------------- docs/hardware/index.md | 35 ------------ docs/index.md | 41 ------------- dsp/README.md | 54 ++++++++++++++++++ firmware/README.md | 39 +++++++++++++ hardware/README.md | 27 +++++++++ .../flex_pcb_schematic.pdf | Bin .../main_board_schematic.pdf | Bin 13 files changed, 181 insertions(+), 203 deletions(-) create mode 100644 android/README.md delete mode 100644 docs/algorithms/index.md delete mode 100644 docs/android/index.md delete mode 100644 docs/firmware/index.md delete mode 100644 docs/hardware/index.md delete mode 100644 docs/index.md create mode 100644 dsp/README.md create mode 100644 firmware/README.md create mode 100644 hardware/README.md rename {docs/hardware => hardware}/flex_pcb_schematic.pdf (100%) rename {docs/hardware => hardware}/main_board_schematic.pdf (100%) diff --git a/.gitignore b/.gitignore index 87df5a1..b4848fb 100644 --- a/.gitignore +++ b/.gitignore @@ -22,3 +22,4 @@ share/python-wheels/ .installed.cfg *.egg MANIFEST +CLAUDE.md diff --git a/README.md b/README.md index 47d3eb2..88cbb14 100644 --- a/README.md +++ b/README.md @@ -1,23 +1,36 @@ -# speech_compass +# SpeechCompass -This repository contains the code accompanying the publication -**SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional -Guidance via Multi-Microphone Localization**, -published in CHI, 2025. (https://arxiv.org/abs/2502.08848) +[![CHI 2025 Best Paper](https://img.shields.io/badge/CHI%202025-Best%20Paper%20Award-gold)](https://dl.acm.org/doi/10.1145/3706598.3713631) +[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) +[![arXiv](https://img.shields.io/badge/arXiv-2502.08848-b31b1b.svg)](https://arxiv.org/abs/2502.08848) -## Installation +[Paper](https://arxiv.org/abs/2502.08848) | [ACM DL](https://dl.acm.org/doi/10.1145/3706598.3713631) | [Blog](https://research.google/blog/making-group-conversations-more-accessible-with-sound-localization/) | [Project Page](https://www.olwal.com/speechcompass) -Setting up the whole system requires multiple steps and custom hardware. Refer -to details in doc folder for each step: +Official code release for **SpeechCompass: Enhancing Mobile Captioning with Diarization and +Directional Guidance via Multi-Microphone Localization**, published at CHI 2025. -1) Custom hardware. The microphone phone-case was custom designed. +![SpeechCompass teaser](docs/images/speech_compass_teaser.jpg) -2) Firmware. The phone-case microcontroller needs to be flashed with firmware +## Overview -2) DSP algorithms. The core processing algorithms were developed in light-weight -C to be platform agnostic. They can be tested separately. +SpeechCompass adds a spatial dimension to mobile speech-to-text by localizing speakers in +360° using a custom 4-microphone phone case. A lightweight C localization pipeline runs on +an embedded microcontroller, and an Android app displays directional captions with speaker +diarization — making group conversations more accessible for people who are hard of hearing. -3) Android application. The app was developed in Android studio +![System diagram](docs/images/system_diagram.png) + +## Repository Structure + +| Component | Description | +|-----------|-------------| +| [`hardware/`](hardware/) | PCB schematics for the custom 4-microphone phone case | +| [`firmware/`](firmware/) | STM32 L5 microcontroller firmware (GCC-PHAT localization → USB output) | +| [`dsp/`](dsp/) | Platform-agnostic C localization and beamforming algorithms, with unit tests | +| [`android/`](android/) | Android Studio app (speech-to-text + directional visualization) | + +Each component can be used independently — in particular, the DSP algorithms can be built +and tested with Bazel without any hardware. ## Citing this work diff --git a/android/README.md b/android/README.md new file mode 100644 index 0000000..63573da --- /dev/null +++ b/android/README.md @@ -0,0 +1,34 @@ +# Android Application + +The app runs on the phone. It uses the phone's built-in microphone for speech recognition +(ASR) and receives azimuth angle data from the SpeechCompass phone case over USB. +Visualizations are built with the [Processing for Android](https://android.processing.org/) +framework. + +![App screenshot](https://github.com/google-deepmind/speech_compass/blob/main/docs/images/app.jpg) + +## Quickest way: install the pre-built APK + +If you don't need to modify the app, sideload the pre-built APK via ADB: + +``` +adb install path/to/speechcompass.apk +``` + +The APK is available on +[Google Drive](https://drive.google.com/file/d/15mf4d6tlzD6GbkNFa18XGd1UUCz8RhcP/view?usp=drive_link&resourcekey=0-Whdp8aFD-M6qDvHfQQJZww). +Connect the phone to your PC before running the command. + +> The app may stop working on newer Android versions due to API changes. + +## Building from source + +1. Install the latest [Android Studio](https://developer.android.com/studio). + +2. Download the [zipped Android Studio project](TODO) and unzip it. + +3. Open Android Studio and import the project (**File → Open**). + +4. Build the project (**Build → Make Project**). + +5. Connect the phone over USB and click **Run** to install and launch. diff --git a/docs/algorithms/index.md b/docs/algorithms/index.md deleted file mode 100644 index 9618e33..0000000 --- a/docs/algorithms/index.md +++ /dev/null @@ -1,28 +0,0 @@ -# Localization and Beamforming algorithms - -## Localization - -The localization algorithms are in the /dsp subfolder. We made lightweight -localization algorithms in C. The low-level implementation allows the algorithms -to be ported to different embedded platforms. Our localization algorithm is -based on generalized cross-correlation with phase transform (GCC-PHAT) [1] and -statistical estimation of source location. - -We used a slightly modified GCC-PHAT approach to calculate the cross correlation -between microphone pairs. In our case, we used normalization to the power of --0.3. Also, while most localizers use SPR (Steered Power Response), we used an -ad-hoc lightweight statistical estimation based on Kernel Density Estimation. - -## Beamforming - -We have two classical beamformer implementations: Delay-and-Sum (DAS) and -Filter-and-Sum (FAS) located in the /beam subfolder. The beamformer takes four -channels and outputs one beamformer channel. We ended up focusing on the -localization in the SpeechCompass paper, so the firmware doesn't run -beamforming. - -### References - -[1] Knapp, C. H. and G.C. Carter, “The Generalized Correlation Method for -Estimation of Time Delay.” IEEE Transactions on Acoustics, Speech and Signal -Processing. Vol. ASSP-24, No. 4, Aug 1976. diff --git a/docs/android/index.md b/docs/android/index.md deleted file mode 100644 index a482688..0000000 --- a/docs/android/index.md +++ /dev/null @@ -1,35 +0,0 @@ -# Android application - -This application runs on the phone, and receives the data from the SpeechCompass -phone case over the USB. We used the -[Processing](https://android.processing.org/) framework for visualizations. - -## Simplest way to run the app - -If no debugging or development is need, loading the app -[APK](https://drive.google.com/file/d/15mf4d6tlzD6GbkNFa18XGd1UUCz8RhcP/view?usp=drive_link&resourcekey=0-Whdp8aFD-M6qDvHfQQJZww) -over [Android Debug Bridge](https://developer.android.com/tools/adb) (adb) is -the easiest way. To do so connect the phone to the PC, open the terminal and -load the app with the command line: - -```adb install path_to_app``` - -The app might stop running on the newer version of Android. - -## Building the Android application - -Building the app is more involved, especially for first time users. - -1) Download and install latest -[Android studio](https://developer.android.com/studio). - -2) Download the -[zipped](TODO) -Android Studio project for SpeechCompass. - -3) Import the project to Android Studio. - -4) Build the project. - -5) Connect the phone over USB and load the application by clicking the Run -button in Android Studio. diff --git a/docs/firmware/index.md b/docs/firmware/index.md deleted file mode 100644 index cf1d272..0000000 --- a/docs/firmware/index.md +++ /dev/null @@ -1,51 +0,0 @@ -# Firmware - -The firmware runs on a low-power microcontroller (STM32 L5). It gets the raw -microphone data, runs lightweight localization and signal processing algorithms -and outputs results to the USB. Loading firmware on the MCU will need a cable -and an ST-LINK programmer. The steps assume some previous experience with STM32. - -## Compiling the firmware - -We used [STM32Cube IDE](https://www.st.com/software/stm32cube-ide) for firmware -development. It provides all the convenient tools for embedded ARM development. -We used the STM32CubeMX to create the project template and import the necessary -drivers. The most convenient way to access the code is to compile the code using -STM32Cube IDE as follows: - -1) Install the STM32 CUBE IDE and ST-LINK toolchain. - -2) Download the -[zipped project](https://drive.google.com/file/d/1aSLFQMz3HJg2O-bxhoN2yHJ5k2ODyI81/view?usp=sharing&resourcekey=0-FB9BwKRDcssJl4RME0ycYQ) -and unzip it. - -3) Import the project into STM32Cube IDE. Click on File -> Import -> Existing -Projects into Workspace, and select the project folder. - -4) Build the project. The console should show no errors. - -## Loading the firmware - -Loading and debugging the firmware on the microcontroller is more involved as it -requires a programmer and a specific connector. - -1) Get an -[ST-LINK programmer](https://www.mouser.com/ProductDetail/STMicroelectronics/STLINK-V3MINIE?qs=MyNHzdoqoQKcLQe5Jawcgw%3D%3D) -and a special -[connector/cable](https://www.tag-connect.com/product/tc2030-ctx-stdc14-for-use-with-stm32-processors-with-stlink-v3). -We used such a connector to reduce physical footprint. - -2) Plug in a USB cable for board power. The programmer doesn't provide power for -the board. - -3) Open the STM32Cube project and compile. Alternatively, this can be done -without STM32Cube IDE by flashing the compiled binary file with the code. This -can be done over a terminal, but still needs ST-LINK drivers installed. - -4) Connect and hold the connector to the board and upload by clicking the debug -button. If doing this the first time, the programmer might need to be -configured. - -5) Open a serial terminal (e.g, Arduino IDE) on a PC connected to the board over -USB. Make sure the correct port is selected. The baud rate doesn't matter. You -should see angles coming in and printing. diff --git a/docs/hardware/index.md b/docs/hardware/index.md deleted file mode 100644 index f6643ae..0000000 --- a/docs/hardware/index.md +++ /dev/null @@ -1,35 +0,0 @@ -# SpeechCompass hardware design - -The hardware is composed of two PCBs: the main board with the microcontroller -and flexible PCB connecting all the microphones together. - - -![Phone case](https://github.com/google-deepmind/speech_compass/blob/main/docs/images/electronics.jpg) - -## Main PCB - -The main PCB is a motherboard that has the STM32 microcontroller and I/O ports. -The board includes an audio codec that provides headphone output. There is a -Bluetooth module as well, but we are not using it. With Bluetooth and the -battery, the system does not need to be tethered to the phone. -[Schematic pdf](https://github.com/google-deepmind/speech_compass/blob/main/docs/hardware/main_board_schematic.pdf) - -## Flex PCB - -Flexible PCB is mainly a cable to connect the microphones to the main board. The -surface mount microphones were soldered to the flex PCB. -[Schematic pdf](https://github.com/google-deepmind/speech_compass/blob/main/docs/hardware/flex_pcb_schematic.pdf) - -## Old version (LiveLocalizer) - -Our initial version of the phone case had one rigid board for everything. (See -UIST demo [proceedings](https://dl.acm.org/doi/10.1145/3586182.3615789) for -details). It is more bulky but it is simpler to build and uses microphones on a -breakout boards. It can run the same firmware. - -![Phone case](https://github.com/google-deepmind/speech_compass/blob/main/docs/images/livelocalizer.png) - -## Firmware - -The firmware runs on the microcontroller. Mainly it runs the localization -algorithm and sends the data to the phone diff --git a/docs/index.md b/docs/index.md deleted file mode 100644 index 5f2bc03..0000000 --- a/docs/index.md +++ /dev/null @@ -1,41 +0,0 @@ -# SpeechCompass - -(This is not an officially supported Google product.) - -SpeechCompass is a real-time, multi-microphone speech localization, -visualization, and diarization platform. We believe that adding a spatial -dimension to sound understanding can greatly improve the usability of audio -interfaces. For more details see our publication in -[CHI'25](https://arxiv.org/pdf/2502.08848) - - -![Phone case](images/speech_compass_teaser.jpg) - -## Multi microphone phone case design - -To allow experimentation, we designed a custom hardware phone case with embedded -four microphones. The localization data is sent from the phone case to the phone -over USB. - -![Phone case](images/phone_case.jpg) - -## Lightweight localization and beamforming - -We implement localization and beamforming algorithms capable of running in -real-time on low-power microcontroller. - -![Phone case](images/system_diagram.png) - -## Android visualization application - -The ASR and visualizations runs as an app on the phone. It actually uses phone -microphone for the ASR and receives the sound direction from the phone case over -USB. ![Phone case](images/app.jpg) - - -## Documentation - -* [Hardware](https://github.com/google-deepmind/speech_compass/blob/main/docs/hardware/index.md) -* [Firmware](https://github.com/google-deepmind/speech_compass/blob/main/docs/firmware/index.md) -* [Android application](https://github.com/google-deepmind/speech_compass/blob/main/docs/android/index.md) -* [DSP algorithms](https://github.com/google-deepmind/speech_compass/blob/main/docs/algorithms/index.md) diff --git a/dsp/README.md b/dsp/README.md new file mode 100644 index 0000000..b5d9143 --- /dev/null +++ b/dsp/README.md @@ -0,0 +1,54 @@ +# DSP Algorithms + +Lightweight, platform-agnostic C (C11) implementations of the localization and beamforming +algorithms. Designed to run on low-power microcontrollers but fully testable on desktop +with Bazel. + +## Localization (`dsp/`) + +Localization is based on Generalized Cross-Correlation with Phase Transform (GCC-PHAT) [1]. + +- **`gcc_phat.c/.h`** — Frequency-domain cross-correlation with partial phase normalization + (exponent −0.3). Operates on a single microphone pair. +- **`tdoa.c/.h`** — Extracts Time Difference of Arrival (TDOA) from GCC-PHAT peaks and + converts delays to azimuth angles. Uses ARM CMSIS DSP for FFTs on embedded targets. +- **`angle_estimation.c/.h`** — Aggregates TDOA measurements from all mic pairs into a + single azimuth estimate (0–359°) using histogram accumulation and Kernel Density + Estimation (KDE) with Gaussian or Pearson Type II kernels. + +Unlike most localizers that use Steered Power Response (SPR), we use a lightweight +statistical KDE approach that is well-suited to real-time embedded constraints. + +## Beamforming (`beam/`) + +Two classical beamformer implementations are included. The SpeechCompass firmware uses +localization only (not beamforming), but these are provided for completeness. + +- **`beam/das_beamformer.c/.h`** — Time-domain Delay-and-Sum beamformer; supports 2- and + 4-microphone circular arrays; stateful ring buffer. +- **`beam/fas_beamformer.c/.h`** — Frequency-domain Filter-and-Sum beamformer with complex + steering weights for circular arrays. + +## Building and testing + +Build rules are defined in `defs.bzl` using Bazel wrapper rules (`c_binary`, `c_library`, +`c_test`) that enforce C11 and a consistent warning set. + +```bash +# Run all tests +bazel test //... + +# Run a specific test +bazel test //test:angle_estimation_test +bazel test //test:gcc_phat_test +bazel test //test:das_beamformer_test +bazel test //test:fas_beamformer_test +``` + +Tests use the `CHECK()` assertion macro from `utility/logging.h`. + +## References + +[1] Knapp, C. H. and G.C. Carter, "The Generalized Correlation Method for Estimation of +Time Delay." *IEEE Transactions on Acoustics, Speech and Signal Processing*, Vol. ASSP-24, +No. 4, Aug 1976. diff --git a/firmware/README.md b/firmware/README.md new file mode 100644 index 0000000..67367ca --- /dev/null +++ b/firmware/README.md @@ -0,0 +1,39 @@ +# Firmware + +The firmware runs on the STM32 L5 microcontroller (ARM Cortex-M33). It reads raw audio +from the four microphones, runs the GCC-PHAT localization algorithm, and streams azimuth +angle estimates to the phone over USB. + +The firmware source is provided as an STM32CubeIDE project. Loading it onto the MCU +requires an ST-LINK programmer. + +## Compiling + +1. Install [STM32Cube IDE](https://www.st.com/software/stm32cube-ide) and the ST-LINK + toolchain. + +2. Download the + [zipped project](https://drive.google.com/file/d/1aSLFQMz3HJg2O-bxhoN2yHJ5k2ODyI81/view?usp=sharing&resourcekey=0-FB9BwKRDcssJl4RME0ycYQ) + and unzip it. + +3. Import into STM32Cube IDE: **File → Import → Existing Projects into Workspace** and + select the project folder. + +4. Build the project. The console should show no errors. + +## Flashing + +Flashing requires a programmer and a tag-connect cable: + +- [ST-LINK V3 Mini programmer](https://www.mouser.com/ProductDetail/STMicroelectronics/STLINK-V3MINIE?qs=MyNHzdoqoQKcLQe5Jawcgw%3D%3D) +- [Tag-Connect TC2030-CTX-STDC14 cable](https://www.tag-connect.com/product/tc2030-ctx-stdc14-for-use-with-stm32-processors-with-stlink-v3) (compact footprint) + +1. Connect a USB cable for board power (the programmer does not supply power). +2. Hold the tag-connect cable against the board's programming header. +3. In STM32Cube IDE, click the debug/flash button. On first use, configure the programmer + if prompted. +4. To verify: open a serial terminal (e.g., Arduino IDE serial monitor) on the USB port. + You should see angle values printing continuously. Baud rate does not matter. + +> **Note:** Flashing can also be done without the IDE by using ST-LINK command-line tools +> to flash a pre-compiled `.hex` or `.bin` binary directly. diff --git a/hardware/README.md b/hardware/README.md new file mode 100644 index 0000000..eb4ed2a --- /dev/null +++ b/hardware/README.md @@ -0,0 +1,27 @@ +# Hardware + +The SpeechCompass phone case consists of two PCBs. + +![Electronics](https://github.com/google-deepmind/speech_compass/blob/main/docs/images/electronics.jpg) + +## Main PCB + +The main board hosts the STM32 L5 microcontroller, an audio codec (headphone output), +and a Bluetooth module (currently unused). With a battery added, the system can operate +untethered from the phone. + +[Schematic (PDF)](main_board_schematic.pdf) + +## Flex PCB + +The flexible PCB routes the four surface-mount microphones back to the main board. + +[Schematic (PDF)](flex_pcb_schematic.pdf) + +## Earlier version: LiveLocalizer + +The original prototype used a single rigid PCB — bulkier but simpler to build, with +microphones on breakout boards. It runs the same firmware. See the +[UIST 2023 demo paper](https://dl.acm.org/doi/10.1145/3586182.3615789) for details. + +![LiveLocalizer](https://github.com/google-deepmind/speech_compass/blob/main/docs/images/livelocalizer.png) diff --git a/docs/hardware/flex_pcb_schematic.pdf b/hardware/flex_pcb_schematic.pdf similarity index 100% rename from docs/hardware/flex_pcb_schematic.pdf rename to hardware/flex_pcb_schematic.pdf diff --git a/docs/hardware/main_board_schematic.pdf b/hardware/main_board_schematic.pdf similarity index 100% rename from docs/hardware/main_board_schematic.pdf rename to hardware/main_board_schematic.pdf From a2610b91549212440ea0860792498f75c97cc4cd Mon Sep 17 00:00:00 2001 From: Alex Olwal <517681+olwal@users.noreply.github.com> Date: Wed, 18 Mar 2026 23:51:31 -0700 Subject: [PATCH 2/4] Enrich README with overview, visualizations, performance, user research, acknowledgments, and related work --- README.md | 79 +++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 71 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 88cbb14..f739af3 100644 --- a/README.md +++ b/README.md @@ -13,25 +13,81 @@ Directional Guidance via Multi-Microphone Localization**, published at CHI 2025. ## Overview -SpeechCompass adds a spatial dimension to mobile speech-to-text by localizing speakers in -360° using a custom 4-microphone phone case. A lightweight C localization pipeline runs on -an embedded microcontroller, and an Android app displays directional captions with speaker -diarization — making group conversations more accessible for people who are hard of hearing. +Mobile speech-to-text apps have a fundamental limitation in group conversations: they +transcribe everything into a single undifferentiated stream, making it hard to follow who +said what. SpeechCompass addresses this by adding a spatial dimension — using multiple +microphones to localize speakers in real time and overlay directional guidance on live +captions. + +The system is designed with accessibility in mind, particularly for people who are hard of +hearing. Rather than relying on machine learning approaches that require video, speaker +embeddings, or high compute, SpeechCompass uses classical DSP (GCC-PHAT + kernel density +estimation) that runs on a low-power embedded microcontroller with low latency and no voice +data retention. + +![App](docs/images/app.jpg) + +### Visualizations + +The Android app offers multiple ways to display speaker direction alongside captions: + +- **Colored text** — each speaker gets a distinct color +- **Directional arrows and glyphs** — indicate where speech is coming from +- **Radar minimap** — a persistent spatial overview of active speakers +- **Edge indicators** — subtle screen-edge cues for peripheral awareness +- **Speech suppression** — filter out speech from a specific direction + +### Performance + +- **Localization accuracy:** 11°–22° average error at normal conversational volume (60–65 dB), + comparable to human localization ability +- **Diarization:** 4-microphone configuration achieves 23–35% relative improvement in + Diarization Error Rate (DER) over a 3-microphone setup across varying SNR conditions + +### User Research + +A survey of 263 frequent captioning users identified speaker distinction as the most +significant unmet need. In a follow-up prototype study with 8 frequent users, colored text +and directional arrows were the preferred visualizations, and all participants agreed that +directional guidance was valuable for group conversations. + +## System ![System diagram](docs/images/system_diagram.png) +SpeechCompass combines a custom hardware phone case with lightweight on-device processing: + +- A **4-microphone phone case** sends audio to an STM32 L5 microcontroller, which runs + GCC-PHAT localization and streams azimuth angles to the phone over USB +- The **Android app** uses the phone's built-in microphone for speech recognition (ASR) + and receives speaker direction from the case — keeping voice data local and processing + costs low +- The **DSP algorithms** are written in portable C11 and can also run on phones with + 2+ built-in microphones, providing 180° localization without additional hardware + ## Repository Structure | Component | Description | |-----------|-------------| -| [`hardware/`](hardware/) | PCB schematics for the custom 4-microphone phone case | -| [`firmware/`](firmware/) | STM32 L5 microcontroller firmware (GCC-PHAT localization → USB output) | -| [`dsp/`](dsp/) | Platform-agnostic C localization and beamforming algorithms, with unit tests | -| [`android/`](android/) | Android Studio app (speech-to-text + directional visualization) | +| [`hardware/README.md`](hardware/README.md) | PCB schematics for the custom 4-microphone phone case | +| [`firmware/README.md`](firmware/README.md) | STM32 L5 firmware (GCC-PHAT localization → USB output) | +| [`dsp/README.md`](dsp/README.md) | Platform-agnostic C localization and beamforming algorithms, with Bazel unit tests | +| [`android/README.md`](android/README.md) | Android Studio app (ASR + directional visualization) | Each component can be used independently — in particular, the DSP algorithms can be built and tested with Bazel without any hardware. +## Related Work + +SpeechCompass builds on **LiveLocalizer** (UIST 2023), which first demonstrated +microphone-array localization augmenting mobile speech-to-text. The same hardware can run +the SpeechCompass firmware. + +> Dementyev, A., Kanevsky, D., Yang, S., Parvaix, M., Lai, C., and Olwal, A. +> "LiveLocalizer: Augmenting Mobile Speech-to-Text with Microphone Arrays, Optimized +> Localization and Beamforming." *UIST 2023 Adjunct*, San Francisco, CA. +> [ACM DL](https://dl.acm.org/doi/10.1145/3586182.3615789) + ## Citing this work ``` @@ -43,6 +99,13 @@ and tested with Bazel without any hardware. } ``` +## Acknowledgments + +We thank Dmitrii Votintcev, Pascal Getreuer, Richard Lyon, Alex Huang, Shao-Fu Shih, +Chet Gnegy, Shaun Kane, James Landay, Malcolm Slaney, Meredith Morris, Carson Lau, +Ngan Nguyen, Mei Lu, Don Barnett, Ryan Geraghty, and Sanjay Batra for their contributions +and support. + ## License and disclaimer Copyright 2025 Google LLC From 52b2d285c449c8fbcbfecdac50dc3423a7eaf0b9 Mon Sep 17 00:00:00 2001 From: Alex Olwal <517681+olwal@users.noreply.github.com> Date: Thu, 19 Mar 2026 00:20:45 -0700 Subject: [PATCH 3/4] Add author GitHub links, bibtex language tag for copy button --- README.md | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/README.md b/README.md index f739af3..f315c39 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,20 @@ -# SpeechCompass +# SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional Guidance via Multi-Microphone Localization [![CHI 2025 Best Paper](https://img.shields.io/badge/CHI%202025-Best%20Paper%20Award-gold)](https://dl.acm.org/doi/10.1145/3706598.3713631) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) [![arXiv](https://img.shields.io/badge/arXiv-2502.08848-b31b1b.svg)](https://arxiv.org/abs/2502.08848) -[Paper](https://arxiv.org/abs/2502.08848) | [ACM DL](https://dl.acm.org/doi/10.1145/3706598.3713631) | [Blog](https://research.google/blog/making-group-conversations-more-accessible-with-sound-localization/) | [Project Page](https://www.olwal.com/speechcompass) +[Paper (PDF)](https://arxiv.org/pdf/2502.08848) | [ACM Digital Library](https://dl.acm.org/doi/10.1145/3706598.3713631) | [Project Page](https://www.olwal.com/speechcompass) | [Google Research Blog](https://research.google/blog/making-group-conversations-more-accessible-with-sound-localization/) + +Artem Dementyev*, Dimitri Kanevsky, Samuel J. Yang, Mathieu Parvaix, Chiong Lai, Alex Olwal* Official code release for **SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional Guidance via Multi-Microphone Localization**, published at CHI 2025. ![SpeechCompass teaser](docs/images/speech_compass_teaser.jpg) +*First and last author contributed equally to this work + ## Overview Mobile speech-to-text apps have a fundamental limitation in group conversations: they @@ -77,6 +81,17 @@ SpeechCompass combines a custom hardware phone case with lightweight on-device p Each component can be used independently — in particular, the DSP algorithms can be built and tested with Bazel without any hardware. +## Citing this work + +```bibtex +@inproceedings{dementyev2025speechcompass, + title={SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional Guidance via Multi-Microphone Localization}, + author={Dementyev, Artem and Kanevsky, Dimitri and Yang, Samuel and Parvaix, Mathieu and Lai, Chiong and Olwal, Alex}, + booktitle={Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems}, + year={2025} +} +``` + ## Related Work SpeechCompass builds on **LiveLocalizer** (UIST 2023), which first demonstrated @@ -88,20 +103,9 @@ the SpeechCompass firmware. > Localization and Beamforming." *UIST 2023 Adjunct*, San Francisco, CA. > [ACM DL](https://dl.acm.org/doi/10.1145/3586182.3615789) -## Citing this work - -``` -@inproceedings{dementyev2025speechcompass, - title={SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional Guidance via Multi-Microphone Localization}, - author={Dementyev, Artem and Kanevsky, Dimitri and Yang, Samuel and Parvaix, Mathieu and Lai, Chiong and Olwal, Alex}, - booktitle={Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems}, - year={2025} -} -``` - ## Acknowledgments -We thank Dmitrii Votintcev, Pascal Getreuer, Richard Lyon, Alex Huang, Shao-Fu Shih, +We thank Sagar Savla, Dmitrii Votintcev, Pascal Getreuer, Richard Lyon, Alex Huang, Shao-Fu Shih, Chet Gnegy, Shaun Kane, James Landay, Malcolm Slaney, Meredith Morris, Carson Lau, Ngan Nguyen, Mei Lu, Don Barnett, Ryan Geraghty, and Sanjay Batra for their contributions and support. From 7471899512cce2a7dd71900036514e36a10c30ed Mon Sep 17 00:00:00 2001 From: Alex Olwal <517681+olwal@users.noreply.github.com> Date: Thu, 19 Mar 2026 00:33:56 -0700 Subject: [PATCH 4/4] Add video and presentation links --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index f315c39..3a7e4b2 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,8 @@ Artem Dementyev*, Dimitri Kanevsky, Samuel J. Yang, Mathieu Parvaix, Chiong Lai, Official code release for **SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional Guidance via Multi-Microphone Localization**, published at CHI 2025. +[Video 4:24](https://www.youtube.com/watch?v=crWXO5T5jaQ) | [Presentation 9:30](https://www.youtube.com/watch?v=cOnMxClQZ4g) + ![SpeechCompass teaser](docs/images/speech_compass_teaser.jpg) *First and last author contributed equally to this work