From Pixels to Text: The Power of Vision LLMs in Understanding Visual Data

Date:

23 January 2025

Venue:

SRM Madurai College for Engineering and Technology

Organised by:

Dept of Computer Science and Engineering

Event summary:

Vision LLMs (Vision Language Models) possess the power to significantly enhance visual data understanding by combining the capabilities of computer vision with natural language processing, allowing machines to interpret images and videos not only by identifying objects but also by grasping the context and relationships within a scene, much like humans do, leading to more comprehensive and nuanced interpretations beyond simple object recognition; this opens up a wide range of applications like generating detailed image captions, answering complex questions about visual content, and even creating visual content based on textual descriptions. The resource person delivered very well about how LLMs work, Visual Question answering, Image Retrieval with text queries, Accessibility enhancements of visual data.

UG Programs

PG Programs

From Pixels to Text: The Power of Vision LLMs in Understanding Visual Data

Date:

Venue:

Organised by:

Event summary:

For queries feel free to consult us

Follow us on:

Know us

Useful links

Contact us