Skip to content

Ahmed-Yusuf-1/Vision

Repository files navigation

Vision 👁️🎙️

Vision is an AI-powered mobile application built with React Native and Expo that helps users "see" the world through auditory feedback. By capturing a photo, the app analyzes the user's facial expression or the objects in their environment and speaks the results back to them with a touch of personality.

🚀 Features

  • Emotion Detection: Recognizes a wide range of human emotions (Happy, Sad, Angry, Surprised, Calm, etc.) using AWS Rekognition.
  • Intelligent Object Identification: Beyond simple labeling, the app uses Google Gemini to refine multiple detected tags into a single, most-likely physical object.
  • Voice Synthesis: Converts detected data into natural speech using Google Cloud Text-to-Speech (HD Chirp models).
  • Concurrent Analysis: Processes both facial expressions and object detection simultaneously for a comprehensive understanding of the scene.
  • Dynamic Responses: Features a variety of personality-filled responses based on the detected emotion.

🛠️ Tech Stack

📂 Project Structure

  • /app: Contains the main application screens including the Home (index.tsx) and Camera (capture.tsx) interfaces.
  • /lib: Core logic for API integrations:
    • rekognition.ts: Handles AWS facial and label analysis.
    • gemini.ts: Refines object labels using generative AI.
    • googleTTS.ts: Manages text-to-speech synthesis.
  • /assets: Stores project images and custom fonts.

⚙️ Setup & Installation

  1. Clone the repository
  2. Install dependencies:
    npm install
  3. Configure Environment: Create a config.ts file (referenced in the source) and provide your API credentials for:
    • AWS Rekognition (Access Key, Secret Key, Region)
    • Google Cloud API Key (for TTS)
    • Google Gemini API Key
  4. Start the app:
    npx expo start

🖥️ Usage

  1. Open the app to the Welcome Screen.
  2. Tap the Vision Logo to open the camera.
  3. Point the camera at yourself or an object and press the Capture button.
  4. The app will display "Analyzing..." while it communicates with AWS and Google Cloud.
  5. Listen to the AI describe your emotion or the object it sees!

Developed with assistance from Gemini 2.5 Pro.

About

a React Native mobile application designed to provide users with an intelligent, auditory interpretation of their surroundings. By leveraging advanced computer vision and artificial intelligence, the app acts as a "digital eye" that can recognize human emotions and identify physical objects in real-time, communicating its findings back to the user

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors