
NaviGatr is a project that aims to aid the visually impaired using various computer vision algorithms. While prior systems have used object detection or depth sensing independently, a combined real-time approach has not been widely implemented.
๐ View on GitHubNaviGatr is a project that aims to aid the visually impaired using various computer vision algorithms. While prior systems have used object detection or depth sensing independently, a combined real-time approach has not been widely implemented.
Our system uses three different machine learning models to extract spatial and contextual data from camera frames captured at eye level via a wearable headset. These models perform:
The output of these models is aggregated, interpreted, and passed to an output module โ currently implemented as a voice assistant โ which guides the user with auditory cues about their surroundings.

We use different object detection models based on platform constraints:

Both models return bounding boxes and class labels which are combined with depth data to localize and describe objects to the user in real-world terms.
We trained a custom model based on the FER2013 dataset, using the EfficientNetB0 architecture. This model runs on the Coral TPU and:
This allows NaviGatr to add social context to the userโs environment.

We use Appleโs Depth Pro model โ a monocular depth estimation model that returns absolute metric depth, unlike most models that return relative depth.
Depth Pro uses a transformer-based architecture and produces sharp gradients and accurate boundaries. The results are used to locate how far away each object is in meters.

This is how we handled output delivery in NaviGatr:
The object detection model returns an array of bounding boxes, each containing (x, y) coordinates relative to the camera image. Meanwhile, the depth estimation model outputs a dense depth map containing the estimated distance (in meters) for each pixel.
Example: If a chair is detected with its bounding box centered near the left-middle of the image and its depth is 2 meters, the audio output will be:
"Chair, 11 o'clock, 2 meters away"
Example: If a sad person is detected at the right side:
"Person, 1 o'clock, sad"
The output is then synthesized into speech and delivered via audio for real-time navigation assistance.

Full walkthrough for flashing Ubuntu, installing dependencies, setting up camera access, and running the core script.
Wiring diagrams for GPIO display, fans, Coral TPU, and safe USB-C routing.
STL files, CAD renderings, and assembly instructions for the NaviGatr headset.
Complete engineering report including benchmarking, design decisions, and testing plans.
Codebase for the models, hardware integration, and deployment scripts.
(Coming soon) Live test run of NaviGatr navigating a test environment.
NaviGatr pushes the boundary of accessible computing, demonstrating how on-device machine learning and thoughtful hardware design can empower a community often overlooked by mainstream tech.