Chat

Shahanur Islam Shagor

Research PaperAssistive Technology, Computer Vision, Embedded Systems, Human-Computer Interaction, Edge ComputingPublished

Wireless Vision-Aid System for the Blind Using Real-Time Object Detection, Multilingual Audio Feedback, and Low-Latency UDP Streaming

WVAB is a cost-effective, open-source navigation framework for visually impaired individuals, integrating YOLOv8n object detection with ESP32-CAM hardware and a custom UDP streaming protocol. The system achieves sub-50ms latency at 15-30 FPS with 91% navigation accuracy, offering multilingual audio feedback at a total hardware cost under USD 15—significantly outperforming existing commercial solutions in affordability and accessibility.

Publication Overview

Publication links, citation access, authorship, and scholarly metadata in one place

This detail page is server-rendered for search visibility and includes structured data so each publication can stand on its own as a fully indexable portfolio entry.

Visual impairment affects over 285 million people globally, with approximately 39 million classified as completely blind according to WHO statistics. Independent navigation remains one of the most critical challenges faced by this population, significantly impacting their quality of life, employment opportunities, and social integration. Traditional mobility aids like white canes and guide dogs, while fundamental, have inherent limitations—white canes fail to detect head-level or dynamic obstacles, while guide dogs cost between USD 25,000-50,000 and require extensive training, making them inaccessible in developing regions.

The WVAB (Wireless Vision-Aid for the Blind) system represents a paradigm shift in assistive technology by leveraging recent advances in deep learning and low-cost embedded computing to deliver a comprehensive navigation solution at a fraction of traditional costs. The entire hardware infrastructure costs less than USD 15, making it accessible to populations that have been systematically excluded from assistive technology markets.

Technical Architecture

The WVAB system integrates four core subsystems into a cohesive real-time navigation framework:

1. Vision Capture Module The system employs an ESP32-CAM microcontroller as the primary vision sensor. This ultra-low-cost device (USD 5-10) captures video frames at 15-30 FPS and transmits them wirelessly via a custom UDP streaming protocol. The choice of UDP over TCP eliminates acknowledgment overhead, reducing network latency by approximately 40% compared to traditional TCP-based streaming. The protocol implements a chunked transmission strategy with resilient packet loss handling, ensuring stable operation under bandwidth-constrained wireless conditions.

2. Object Detection Pipeline At the core of WVAB lies a YOLOv8n (nano) object detection model, specifically chosen for its exceptional balance between computational efficiency and detection accuracy. The model achieves a mean Average Precision (mAP@0.5) of 0.472 on the COCO dataset while maintaining inference latency below 50ms on GPU-accelerated hardware.

The detection pipeline implements navigation-priority class filtering, focusing computational resources on safety-critical object categories including vehicles, pedestrians, stairs, poles, traffic signs, and architectural obstacles. This selective approach reduces false positives and ensures that audio feedback prioritizes actionable environmental information.

3. NavigationPlanner Engine The NavigationPlanner module, implemented in optimized C++, represents a novel contribution to assistive navigation systems. Unlike conventional systems that provide simple object lists, NavigationPlanner performs risk-aware spatial reasoning by:

Partitioning the field of view into three lateral zones (Left, Center, Right)
Discretizing distance into four proximity categories (Immediate, Near, Medium, Far)
Applying class-specific risk weights to detected objects
Computing zone-wise safety scores using a weighted accumulation algorithm
Generating threshold-based path recommendations optimized for blind pedestrian safety

This spatial decision framework enables the system to provide directional guidance ("Clear path on the left") rather than merely listing detected objects, significantly improving the cognitive load on users.

4. Multilingual Audio Feedback System A critical innovation in WVAB is its JSON-configurable label architecture, enabling dynamic language switching without model retraining or code modification. The system currently supports English and Bengali, with extensibility to any language through simple JSON label file updates. This design democratizes access to assistive technology in non-English speaking regions, particularly in South Asia where language barriers have historically excluded populations from technology adoption.

Performance and Validation

Experimental validation demonstrates WVAB's superiority across multiple performance dimensions:

Latency: End-to-end system latency remains consistently below 50ms, well within the 200ms threshold required for safe pedestrian navigation
Throughput: Sustained 15-30 FPS operation under realistic wireless conditions
Accuracy: 91.5% navigation decision accuracy in complex indoor environments with multiple dynamic obstacles
Reliability: Stable operation under packet loss rates up to 5%, demonstrating robustness to real-world wireless conditions

Comparative analysis against commercial systems like OrCam MyEye (USD 3,500+) and Microsoft Seeing AI reveals that WVAB achieves comparable functional capabilities at approximately 0.4% of the cost, with the additional advantage of offline operation—no cloud dependency means no recurring costs, no privacy concerns, and no connectivity requirements.

Deployment Considerations and Real-World Impact

The system's modular architecture supports multiple deployment configurations:

Wearable Configuration: ESP32-CAM mounted on eyeglasses frame with bone-conduction audio output
Smartphone Configuration: Utilizing the phone's rear camera with the WVAB app running navigation logic locally
Hybrid Configuration: Multiple ESP32-CAM units providing 180-degree field of view coverage

From a global accessibility perspective, WVAB's sub-USD 15 cost point makes it feasible for NGO-scale deployment in developing nations. In Bangladesh alone, with an estimated 1.5 million visually impaired individuals, a nationwide deployment would cost approximately USD 22.5 million—less than the cost of training 900 guide dogs.

Technical Contributions and Innovation

WVAB advances the state-of-the-art in assistive navigation through several key innovations:

Chunked UDP Protocol: First demonstrated application of custom UDP chunking for resilient ESP32-CAM streaming in assistive contexts
Risk-Weighted Spatial Reasoning: Novel navigation scoring algorithm incorporating object-class risk weights and zone-based path planning
Edge-First Architecture: Complete computational pipeline executable on consumer-grade hardware without cloud dependency
Linguistic Accessibility: JSON-based multilingual framework enabling rapid localization without technical expertise

Future Research Directions

Ongoing work focuses on integrating additional sensor modalities (ultrasonic rangefinders, IMU-based orientation tracking), implementing advanced path prediction using recurrent neural networks, and conducting large-scale human factors studies with blind user populations. The research team is also exploring federated learning approaches to enable collaborative model improvement across distributed WVAB deployments while preserving user privacy.

The WVAB framework represents a significant step toward democratizing assistive technology, proving that academic rigor, open-source principles, and cost-conscious engineering can converge to create solutions that genuinely serve underrepresented populations.

View Publication Citation DOI

11501

Research Snapshot

Publisher

Academia.edu

Published

February 22, 2026

Authors

Shahanur Islam Shagor

My Role

Lead Author, Principal Investigator, System Architect, Lead Developer

DOI

https://doi.org/10.5281/ZENODO.18733441

Impressions5

Comments0

Shares1

More Research

Research Paper

Published

Autonomous Systems, UAV Swarm Robotics, Cybersecurity, Acoustic Signal Processing

Feb 22, 2026Academia.edu

Decentralized Coordination and Acoustic Localization in Secure Autonomous Drone Swarms

This paper presents a fully implemented and experimentally validated decentralized autonomous drone swarm system designed for GPS-denied, vision-impaired, and electronically contested environments. It integrates six tightly coupled subsystems including a Byzantine fault-tolerant leader election protocol (MCSS), AES-256-GCM encrypted mesh communication, ML-enhanced obstacle avoidance, GCC-PHAT acoustic TDOA localization, a distributed cryptographic flight ledger (Flying Ledger), and a C++ Differential Immune System for motor fault resilience demonstrating robust, real-time swarm coordination without centralized control or GPS dependency.

Read publication

6400

Research Discussion

Comments and replies

0 comments

0 replies

No comments added yet. Be the first person to start the discussion on this publication.

Join the discussion