#vision-language-models tag

Conntour raises $7M from General Catalyst, YC to build an AI search engine for security video systems | TechCrunch

Surveillance technology faces ethical scrutiny amid privacy concerns, yet companies like Conntour thrive by selectively choosing clients and raising significant investment.

fromTNW | Health-Tech

1 month ago

Cedars-Sinai's AI beats specialist models at reading heart scam

EchoPrime, a video-based vision-language model, analyses echocardiogram footage and generates a written report of cardiac form and function. Its findings were published in Nature (volume 650, pages 970-977) in February 2026, under the title 'Comprehensive echocardiogram evaluation with view primed vision language AI.'

Medicine

Software development

fromTechzine Global

1 month ago

Microsoft introduces open-source multimodal Phi-4 reasoning model

Microsoft's Phi-4-reasoning-vision-15B combines vision and reasoning capabilities using mid-fusion architecture, outperforming larger models on mathematical and scientific benchmarks while maintaining efficiency through selective multimodal layer processing.

fromNature

2 months ago

Merlin: a computed tomography vision-language foundation model and dataset - Nature

The large volume of abdominal computed tomography (CT) scans coupled with the shortage of radiologists have intensified the need for automated medical image analysis tools. Previous state-of-the-art approaches for automated analysis leverage vision-language models (VLMs) that jointly model images and radiology reports.

Medicine

Privacy technologies

fromPrivacy International

2 months ago

Nowhere to Hide? Privacy Risks and Policy Implications of AI Geolocation

Vision-Language Models can accurately determine photo locations without GPS data, creating serious privacy and human rights risks including surveillance, doxxing, and discriminatory policing.

fromTechCrunch

2 months ago

Ex-Googlers are building infrastructure to help companies understand their video data | TechCrunch

Businesses are generating more video than ever. From years of broadcast archives to thousands of store cameras and countless hours of production footage, most of it just sits unused on servers, unwatched and unanalyzed. This is dark data: a massive, untapped resource that companies collect automatically but almost never use in a meaningful way. To tackle the problem, Aza Kai (CEO) and Hiraku Yanagita (COO), two former Googlers who spent nearly a decade working together at Google Japan, decided to build their own solution.

Artificial intelligence

fromNature

3 months ago

Multimodal learning with next-token prediction for large multimodal models - Nature

Since AlexNet5, deep learning has replaced heuristic hand-crafted features by unifying feature learning with deep neural networks. Later, Transformers6 and GPT-3 (ref. 1) further advanced sequence learning at scale, unifying structured tasks such as natural language processing. However, multimodal learning, spanning modalities such as images, video and text, has remained fragmented, relying on separate diffusion-based generation or compositional vision-language pipelines with many hand-crafted designs.

Artificial intelligence

fromZDNET

3 months ago

Nvidia's physical AI models clear the way for next-gen robots - here's what's new

Nvidia released open Cosmos and GR00T physical-AI models to accelerate robot development, enabling realistic world understanding, simulation, reasoning, and reduced pretraining effort.

Artificial intelligence

fromTechCrunch

5 months ago

Nvidia announces new open AI models and tools for autonomous driving research | TechCrunch

Nvidia released Alpamayo-R1, an open vision-language reasoning model plus Cosmos Cookbook resources to accelerate level-4 autonomous driving and physical AI development.

Wearables

fromZDNET

7 months ago

These Halo smart glasses just got a major memory boost, thanks to Liquid AI

Brilliant Labs will integrate Liquid AI's vision–language foundation models into Halo AI smart glasses to improve real-time scene understanding and agentic memory.

Artificial intelligence

fromComputerworld

8 months ago

Microsoft researchers develop new tech for video AI agents

Microsoft is developing MindJourney, a video-AI framework that explores 3D spaces using world models, VLMs, video generation, and reasoning to predict surroundings and movement.

Philosophy

fromTheregister

8 months ago

Vision AI models see optical illusions when none exist

Vision language models, like GPT-5, misinterpret simple images as complex illusions, reflecting a form of cognitive bias similar to humans.

Artificial intelligence

fromHackernoon

2 years ago

Researchers Push Vision-Language Models to Grapple with Metaphors, Idioms, and Sarcasm | HackerNoon

The V-FLUTE dataset enhances understanding of figurative language in AI, assessing the performance of vision-language models.

Artificial intelligence

fromHackernoon

2 years ago

Can AI Understand a Joke? New Dataset Tests Bots on Metaphors, Sarcasm, and Humor | HackerNoon

Large AI models struggle with figurative language, which presents challenges due to its implicit meanings.

#idefics2

fromHackernoon

56 years ago

Scala

How an 8B Open Model Sets New Standards for Safe and Efficient Vision-Language AI | HackerNoon

fromHackernoon

10 months ago

Artificial intelligence

The Small AI Model Making Big Waves in Vision-Language Intelligence | HackerNoon

fromHackernoon

56 years ago

Scala

How an 8B Open Model Sets New Standards for Safe and Efficient Vision-Language AI | HackerNoon

fromHackernoon

10 months ago

Artificial intelligence

The Small AI Model Making Big Waves in Vision-Language Intelligence | HackerNoon

The Artistry Behind Efficient AI Conversations | HackerNoon

The cross-attention architecture exceeds fully autoregressive models in vision-language performance, despite having a higher computational cost.

#machine-learning

fromHackernoon

10 months ago

Artificial intelligence

Why The Right AI Backbones Trump Raw Size Every Time | HackerNoon

fromHackernoon

10 months ago

Artificial intelligence

Can Smaller AI Outperform the Giants? | HackerNoon

fromScienceDaily

11 months ago

Artificial intelligence

Study shows vision-language models can't handle queries with negation words

fromHackernoon

10 months ago

Artificial intelligence

Why The Right AI Backbones Trump Raw Size Every Time | HackerNoon

fromHackernoon

10 months ago

Artificial intelligence

Can Smaller AI Outperform the Giants? | HackerNoon

fromScienceDaily

11 months ago

Artificial intelligence

Study shows vision-language models can't handle queries with negation words

more#machine-learning

Artificial intelligence

fromPyImageSearch

11 months ago

Content Moderation via Zero Shot Learning with Qwen 2.5 - PyImageSearch

Digital platforms face complex challenges in content moderation due to user-generated content growth.

Qwen 2.5 models can enhance content moderation through advanced multimodal understanding.

#vision-language-models#vision-language-models

Conntour raises $7M from General Catalyst, YC to build an AI search engine for security video systems | TechCrunch

Cedars-Sinai's AI beats specialist models at reading heart scam

Microsoft introduces open-source multimodal Phi-4 reasoning model

Merlin: a computed tomography vision-language foundation model and dataset - Nature

Nowhere to Hide? Privacy Risks and Policy Implications of AI Geolocation

Ex-Googlers are building infrastructure to help companies understand their video data | TechCrunch

Multimodal learning with next-token prediction for large multimodal models - Nature

Nvidia's physical AI models clear the way for next-gen robots - here's what's new

Nvidia announces new open AI models and tools for autonomous driving research | TechCrunch

These Halo smart glasses just got a major memory boost, thanks to Liquid AI

Microsoft researchers develop new tech for video AI agents

Vision AI models see optical illusions when none exist

Researchers Push Vision-Language Models to Grapple with Metaphors, Idioms, and Sarcasm | HackerNoon

Can AI Understand a Joke? New Dataset Tests Bots on Metaphors, Sarcasm, and Humor | HackerNoon

How an 8B Open Model Sets New Standards for Safe and Efficient Vision-Language AI | HackerNoon

The Small AI Model Making Big Waves in Vision-Language Intelligence | HackerNoon

How an 8B Open Model Sets New Standards for Safe and Efficient Vision-Language AI | HackerNoon

The Small AI Model Making Big Waves in Vision-Language Intelligence | HackerNoon

The Artistry Behind Efficient AI Conversations | HackerNoon

Why The Right AI Backbones Trump Raw Size Every Time | HackerNoon

Can Smaller AI Outperform the Giants? | HackerNoon

Study shows vision-language models can't handle queries with negation words

Why The Right AI Backbones Trump Raw Size Every Time | HackerNoon

Can Smaller AI Outperform the Giants? | HackerNoon

Study shows vision-language models can't handle queries with negation words

Content Moderation via Zero Shot Learning with Qwen 2.5 - PyImageSearch

#vision-language-models
#vision-language-models