CosmoNav - AI-Powered Navigation for the Visually Impaired

Abinash Patti, Gursimar Singh, Vishnu Ajit

Overview

CosmoNav is an AI-powered, navigation app designed for visually impaired users that combines real-time object detection with voice-activated navigation. The app features a wake word ("Cosmo") activation system with continuous audio monitoring and automatic pause detection, allowing users to search for destinations and navigate completely hands-free without pressing any buttons. To make this all possible, it combines:

Computer vision for obstacle detection using the device camera and our own segmentation model
Voice recognition through AssemblyAI for speech-to-text processing
Google Maps for routing and navigation

All in all, CosmoNav creates a comprehensive accessibility solution that announces detected objects, provides turn-by-turn voice guidance, and enables fully autonomous interaction through voice commands alone.

Installation

First make a repo somewhere nice
Do git clone https://github.com/YeehawMcfly/cosmonav.git in that directory

Create a .env file and add the following:

EXPO_PUBLIC_GOOGLE_MAPS_API_KEY=your_google_maps_api_key
EXPO_PUBLIC_SPEECH_TO_TEXT_API_KEY=your_assemblyai_api_key

Do npm install
Do npx expo install expo-av (for voice search support)
Do npx expo start --tunnel
Scan the QR code with your phone (supposing you have Expo Go installed already)

Voice Search Setup

To use fully voice-activated navigation with wake word "Cosmo":

Create an account at https://www.assemblyai.com
Get your API key from the dashboard
Add it to the .env file as EXPO_PUBLIC_SPEECH_TO_TEXT_API_KEY
Install audio support: npx expo install expo-av

How Voice Activation Works:

The app uses continuous wake word detection and automatic pause detection - completely hands-free!

🎙️ Always Listening: Uses a continuous recording loop that never stops
3-Second Cycles: Records 3 seconds, checks for "Cosmo", immediately starts next cycle
Wake Word: Say "Cosmo" to activate voice search
Automatic Search on Pause:
- Say your search query and pause for 3 seconds
- The app automatically detects the pause and processes your search
- Example: "Cosmo" → "coffee shops near me" → [pause 3 seconds] → searches automatically
- No need to press any buttons!
Smart Silence Detection:
- Monitors audio levels every 500ms
- Detects when you stop speaking
- Automatically processes search after 3 seconds of silence
- Maximum recording time: 20 seconds
Auto-Resume: After each search, automatically resumes the listening loop
Smart Pause: Pauses during active navigation, auto-resumes when navigation ends
Visual Feedback:
- Orange banner: "Listening for Cosmo..." (actively recording)
- Red banner: "Speak your search, then pause for 3 seconds..." (voice command mode)
- Gray banner: "Voice inactive" with restart button (if stopped)

Check the console logs to see the pause detection in action:

Starting new recording cycle... (every 3 seconds)
Silence detected (1.5s) (counting silence)
3 seconds of silence detected, processing speech... (auto-search triggered)
Transcribed: [what you said]
Wake word detected! (when you say Cosmo)

Completely hands-free! Just say "Cosmo", speak your search, and pause. 🚀

Technologies

Core Framework

React Native - Cross-platform mobile development
Expo - Development platform and tooling
TypeScript - Type-safe JavaScript

Computer Vision & AI

Roboflow - Instance segmentation & object detection API
expo-camera - Real-time camera access and frame capture
expo-image-manipulator - Image resizing and preprocessing

Voice & Audio

AssemblyAI - Real-time speech-to-text transcription
expo-speech - Text-to-speech for voice announcements
expo-av - Audio recording for wake word detection
Custom wake word detection system ("Cosmo")
Automatic silence detection (3-second pause)

Navigation & Maps

Google Maps API - Geocoding, places search, and routing
Google Maps JavaScript API (WebView) - Interactive map display
expo-location - GPS positioning and location tracking

Additional Features

expo-image-picker - Demo image upload mode
Continuous audio loop for hands-free operation
Real-time bounding box overlays for detected objects
Voice-activated search with autocomplete

Future Steps

Improve image detection model by acquiring compute power and training it on a massive dataset.
Ensure Speech to Text works seamlessly.
Partner with a hardware (preferably spectacles) company to reduce dependence on phones.
Reach out to institutions like the Canadian National Institute for the Blind (CNIB) to get feedback from future users.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.vscode		.vscode
app		app
assets/images		assets/images
components		components
constants		constants
hooks		hooks
presentation_materials		presentation_materials
scripts		scripts
.gitignore		.gitignore
App.js		App.js
GOOGLE_MAPS_SETUP.md		GOOGLE_MAPS_SETUP.md
INSTANCE_SEGMENTATION_SETUP.md		INSTANCE_SEGMENTATION_SETUP.md
PUBLIC_TRANSIT_FEATURE.md		PUBLIC_TRANSIT_FEATURE.md
README.md		README.md
ROBOFLOW_INTEGRATION.md		ROBOFLOW_INTEGRATION.md
VOICE_COMMANDS.md		VOICE_COMMANDS.md
app.config.js		app.config.js
app.json		app.json
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CosmoNav - AI-Powered Navigation for the Visually Impaired

Abinash Patti, Gursimar Singh, Vishnu Ajit

Overview

Installation

Voice Search Setup

How Voice Activation Works:

Technologies

Future Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CosmoNav - AI-Powered Navigation for the Visually Impaired

Abinash Patti, Gursimar Singh, Vishnu Ajit

Overview

Installation

Voice Search Setup

How Voice Activation Works:

Technologies

Future Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages