#visual-comprehension

[ follow ]

LLaVA-Phi: The Training We Put It Through | HackerNoon

LLaVA-Phi utilizes a structured training pipeline to improve visual and language model capabilities through fine-tuning.

Introducing LLaVA-Phi: A Compact Vision-Language Assistant Powered By a Small Language Model | HackerNoon

LLaVA-Phi showcases the capabilities of smaller language models in multi-modal tasks with only 2.7B parameters.

ChatGPT 4o lets you have real-time audio-video conversations with "emotional" chatbot

OpenAI debuts GPT-4o for text, vision, and audio, offering real-time capabilities and multilingual translations.
[ Load more ]