This project aims to assist visually impaired individuals by providing real-time audio descriptions of their surroundings. The system utilizes advanced computer vision models, including YOLOv11 for object detection and MediaPipe for pose estimation, to interpret the environment and convey information through speech.
Features Object Detection: Identifies and locates objects in real-time using the YOLOv11 model.
Pose Estimation: Detects human poses to interpret actions such as standing, sitting, walking, or running.
Distance Estimation: Estimates the distance of detected objects from the user.
Audio Feedback: Provides real-time audio descriptions of detected objects and their positions.
Gradio Interface: Offers a web-based interface for visualization and interaction.