
- From Pixels to Text: The Power of Vision LLMs in Understanding Visual Data
Date:
23 January 2025
Venue:
SRM Madurai College for Engineering and Technology
Organised by:
Dept of Computer Science and Engineering
Event summary:
Vision LLMs (Vision Language Models) possess the power to significantly enhance visual data understanding by combining the capabilities of computer vision with natural language processing, allowing machines to interpret images and videos not only by identifying objects but also by grasping the context and relationships within a scene, much like humans do, leading to more comprehensive and nuanced interpretations beyond simple object recognition; this opens up a wide range of applications like generating detailed image captions, answering complex questions about visual content, and even creating visual content based on textual descriptions. The resource person delivered very well about how LLMs work, Visual Question answering, Image Retrieval with text queries, Accessibility enhancements of visual data.