Artificial intelligence is touching nearly every aspect of life, including assistive technology for vision-impaired individuals. And just like in other arenas, the AI used to assist them is good, but far from perfect. Cornell Tech researchers have studied these tools, identifying both their strengths and critical areas for improvement to better support blind and low-vision individuals.
Artificial intelligence is increasingly integrating into diverse aspects of daily life, profoundly impacting the development of assistive technologies for individuals who are blind or have low vision (BLV). While these AI-powered tools represent a significant leap forward in accessibility and support, enabling greater independence and interaction with the environment, researchers acknowledge that their current capabilities are not yet fully optimized. There remains a considerable gap between their existing functionality and the potential for truly comprehensive and nuanced assistance, highlighting the need for continuous refinement and innovation to move beyond 'good' towards 'excellent' in serving the BLV community.
Researchers at Cornell Tech, under the leadership of Associate Professor Shiri Azenkot and doctoral candidate Ricardo Gonzalez, undertook an extensive study to evaluate the efficacy of AI in visual interpretation for BLV individuals. They developed VisionPal, a sophisticated smartphone application powered by a multimodal large language model (MLLM) capable of processing images, audio, and video to provide contextual interpretations. The study, detailed in a paper presented at CHI ’26 and receiving an honorable mention, involved 20 vision-impaired participants in a two-week diary study. This methodology allowed for the collection of real-world interactions and user feedback, providing invaluable insights into how these advanced AI systems perform in authentic daily scenarios.
The VisionPal study revealed that while MLLM-enabled applications excel at answering general 'What is this?' questions, providing satisfactory initial visual interpretations, their performance significantly diminished when confronted with requests for more detailed or complex assistance. Participants reported high levels of satisfaction and trust in the app for basic tasks. However, during conversational follow-ups on intricate tasks like reading precise cooking instructions or medication dosages, VisionPal's accuracy plummeted, correctly responding to only 56.6% of queries. Furthermore, a concerning 22.2% of its responses contained false information, underscoring significant risks and the urgent need for improved reliability in critical applications where accuracy is paramount for user safety and independence.
In response to the identified shortcomings, the Cornell Tech research team proposed a comprehensive framework of nine 'visual assistant' skills critical for enhancing the reliability and effectiveness of MLLMs in assistive technology. These skills are designed to ensure goal-relevant and dependable visual assistance. Key proposals include: maintaining neutral factual communication for objectivity; establishing adaptive communication protocols to cater to individual user preferences; ensuring goal-oriented collaboration by delivering only pertinent information; transparently handling uncertainty by explicitly acknowledging AI's limitations; and facilitating graceful handoffs, either by directing users to appropriate human resources or by clearly stating when the AI lacks the expertise to provide accurate information. These guidelines aim to guide future AI development towards more responsible and user-centric solutions.
Both Shiri Azenkot and Ricardo Gonzalez emphatically stressed the foundational principle that human needs must remain at the core of all AI development efforts, particularly for assistive technologies. Despite the remarkable progress already achieved in this field, they argue that true improvement stems from a deep understanding of what people genuinely require and how technology can most effectively meet those needs. This philosophical approach ensures that AI tools are not merely technically advanced but are also intuitively supportive, empowering, and ultimately, life-enhancing for blind and low-vision individuals. The ongoing research in this area, supported by the National Science Foundation, reinforces a commitment to ethical and user-focused technological advancement.