We rigorously evaluated LLaVA-Phi using an extensive array of academic benchmarks specifically designed for multi-modal models, achieving superior performance in visual-based question-answering.
LLaVA-Phi outperformed numerous existing large multimodal models, demonstrating particularly notable results on ScienceQA due to its specialized training in code generation and mathematical corpora.
Collection
[
|
...
]