Vision is an AI-powered mobile application built with React Native and Expo that helps users "see" the world through auditory feedback. By capturing a photo, the app analyzes the user's facial expression or the objects in their environment and speaks the results back to them with a touch of personality.
- Emotion Detection: Recognizes a wide range of human emotions (Happy, Sad, Angry, Surprised, Calm, etc.) using AWS Rekognition.
- Intelligent Object Identification: Beyond simple labeling, the app uses Google Gemini to refine multiple detected tags into a single, most-likely physical object.
- Voice Synthesis: Converts detected data into natural speech using Google Cloud Text-to-Speech (HD Chirp models).
- Concurrent Analysis: Processes both facial expressions and object detection simultaneously for a comprehensive understanding of the scene.
- Dynamic Responses: Features a variety of personality-filled responses based on the detected emotion.
- Framework: React Native with Expo
- Navigation: Expo Router
- Computer Vision: AWS Rekognition SDK
- AI/LLM: Google Generative AI (Gemini 2.5 Flash Lite)
- Audio & Speech: Google Cloud Text-to-Speech API and
expo-av
- /app: Contains the main application screens including the Home (
index.tsx) and Camera (capture.tsx) interfaces. - /lib: Core logic for API integrations:
rekognition.ts: Handles AWS facial and label analysis.gemini.ts: Refines object labels using generative AI.googleTTS.ts: Manages text-to-speech synthesis.
- /assets: Stores project images and custom fonts.
- Clone the repository
- Install dependencies:
npm install
- Configure Environment:
Create a
config.tsfile (referenced in the source) and provide your API credentials for:- AWS Rekognition (Access Key, Secret Key, Region)
- Google Cloud API Key (for TTS)
- Google Gemini API Key
- Start the app:
npx expo start
- Open the app to the Welcome Screen.
- Tap the Vision Logo to open the camera.
- Point the camera at yourself or an object and press the Capture button.
- The app will display "Analyzing..." while it communicates with AWS and Google Cloud.
- Listen to the AI describe your emotion or the object it sees!
Developed with assistance from Gemini 2.5 Pro.