#vision-language-model-vlm--ai

[ follow ]
#ai
fromNature
1 day ago
Artificial intelligence

No humans allowed: scientific AI agents get their own social network

fromInfoQ
2 days ago
Artificial intelligence

Google's Aletheia Advances the State of the Art of Fully Autonomous Agentic Math Research

Digital life
fromdiacritical
1 week ago

From Messages to Conversations: AI Agents are Changing how we Find Culture

Automated web traffic has surged, with AI bots now significantly outnumbering human visitors, impacting arts organizations and cultural discovery.
Data science
fromInfoQ
2 weeks ago

Context Engineering with Adi Polak

Context engineering moves beyond prompt engineering to enhance AI systems by adapting language and practices for better model interaction.
Artificial intelligence
fromNature
1 day ago

No humans allowed: scientific AI agents get their own social network

Agent4Science is a social network for AI agents to discuss research papers without human participation.
Artificial intelligence
fromInfoQ
2 days ago

Google's Aletheia Advances the State of the Art of Fully Autonomous Agentic Math Research

Aletheia, an AI by Google, autonomously solved 6 out of 10 novel math problems, marking a significant advancement in automated proof discovery.
Typography
fromMedium
2 weeks ago

AI is rewriting the rules. Language is following.

The word 'delve' has surged in usage due to AI's influence on language and communication patterns.
Digital life
fromdiacritical
1 week ago

From Messages to Conversations: AI Agents are Changing how we Find Culture

Automated web traffic has surged, with AI bots now significantly outnumbering human visitors, impacting arts organizations and cultural discovery.
Data science
fromInfoQ
2 weeks ago

Context Engineering with Adi Polak

Context engineering moves beyond prompt engineering to enhance AI systems by adapting language and practices for better model interaction.
#llms
UX design
fromMedium
3 hours ago

The web trained AI to deceive. Now designers have to untrain it.

LLMs replicate UX dark patterns from the web, leading to deceptive design practices in generated content.
UX design
fromMedium
3 hours ago

The web trained AI to deceive. Now designers have to untrain it.

LLMs replicate UX dark patterns from the web, leading to deceptive design practices in generated content.
Node JS
fromRaymondcamden
4 days ago

Summarizing Docs with Built-in AI

On-device summarization of various document types, including Office formats, is achievable using libraries like officeParser and Chrome's Summary API.
fromFast Company
3 days ago

How AI and education are shaping the future of aesthetics

Aesthetic inspiration is social and collective, but aesthetic results are deeply personal. What works for one face, skin type, or bone structure won't always work for another.
Healthcare
DevOps
fromTechzine Global
3 days ago

Claude Opus 4.7 is no Mythos, and that's a good thing

Claude Opus 4.7 improves software engineering, vision, and agentic tasks, but is not the risky Mythos model Anthropic refrains from fully releasing.
#openai
Software development
fromEngadget
4 days ago

OpenAI's latest Codex update builds the groundwork for its upcoming super app

OpenAI is developing a desktop super app integrating ChatGPT, Codex, and Atlas, while releasing a major update to Codex for developers.
Software development
fromEngadget
4 days ago

OpenAI's latest Codex update builds the groundwork for its upcoming super app

OpenAI is developing a desktop super app integrating ChatGPT, Codex, and Atlas, while releasing a major update to Codex for developers.
#google
European startups
fromFast Company
4 days ago

AI isn't built for all languages and cultures. There's a push to fix that

Assem Sabry created Horus, an AI model focused on Egyptian culture, to address the lack of representation in the AI industry.
fromwww.npr.org
6 days ago

In the brain, objects seen and imagined follow the same neural path

"I can look at an object in the world around me, but I can also close my eyes and imagine the object," says Varun Wadia, highlighting the dual capability of visual perception and imagination.
Science
Psychology
fromInfoQ
1 week ago

Anthropic Paper Examines Behavioral Impact of Emotion-Like Mechanisms in LLMs

Large language models exhibit internal representations of emotions that influence their behavior, though they do not actually experience these emotions.
Games
fromThe Atlantic
6 days ago

The Strange Origin of AI's 'Reasoning' Abilities

Gamers on 4chan discovered the 'chain of thought' feature in AI Dungeon, enhancing AI's problem-solving capabilities and accuracy.
Artificial intelligence
fromInfoQ
1 day ago

Designing Memory for AI Agents: Inside Linkedin's Cognitive Memory Agent

LinkedIn's Cognitive Memory Agent enables context-aware AI systems that retain knowledge across interactions, enhancing personalization and continuity.
UX design
fromMedium
16 hours ago

The deceptive nature of today's AI conversation design and how to fix it

Conversation design for non-human participants may be outdated and inefficient, raising questions about its effectiveness in user interactions.
#agentic-ai
Software development
fromTechCrunch
5 days ago

OpenAI updates its Agents SDK to help enterprises build safer, more capable agents | TechCrunch

OpenAI's updated SDK enhances agent development with sandboxing and in-distribution harness features for safer, more complex automated tasks.
Software development
fromTechCrunch
5 days ago

OpenAI updates its Agents SDK to help enterprises build safer, more capable agents | TechCrunch

OpenAI's updated SDK enhances agent development with sandboxing and in-distribution harness features for safer, more complex automated tasks.
Philosophy
fromJames Bennett
1 week ago

Let's talk about LLMs

The current technological landscape may represent a significant shift driven by large language models, but its ultimate impact remains uncertain.
Data science
fromAol
2 weeks ago

Demystifying structured data: How to speak an LLM's native language

Structured data is essential for LLMs to accurately interpret and rank online content, enhancing search visibility and user engagement.
Python
fromEfficientcoder
1 week ago

Build Your Own AI Meme Matcher: A Beginner's Guide to Computer Vision with Python

Computer Vision enables real-time facial recognition and meme matching using Object-Oriented Programming for clean and organized code.
Software development
fromInfoWorld
5 days ago

Mastering the dull reality of sexy AI

The gap in enterprise AI lies in building effective systems for retrieval, evaluation, memory, and governance, not just access to models.
Artificial intelligence
fromTheregister
5 days ago

LLMs fail in 8 out of 10 early differential diagnosis cases

AI models fail at early differential diagnosis in over 80% of cases, highlighting significant limitations for patient self-diagnosis.
Artificial intelligence
fromFortune
4 days ago

Forget the chatbot wars. Demis Hassabis is thinking about something far bigger | Fortune

AI leadership should be global and diverse to ensure ethical development and deployment.
Data science
fromInfoWorld
2 weeks ago

Why 'curate first, annotate smarter' is reshaping computer vision development

Strategic data selection and curation reduce annotation costs and enhance development productivity in computer vision teams.
Python
fromPyImageSearch
3 weeks ago

Autoregressive Model Limits and Multi-Token Prediction in DeepSeek-V3 - PyImageSearch

Multi-Token Prediction (MTP) in DeepSeek-V3 allows simultaneous token forecasting, enhancing training speed and contextual understanding.
#artificial-intelligence
Artificial intelligence
fromTechCrunch
1 week ago

From LLMs to hallucinations, here's a simple guide to common AI terms | TechCrunch

A glossary of key artificial intelligence terms is essential for understanding the complex language used in the industry.
Python
fromBusiness Matters
3 weeks ago

Building AI-powered visual solutions: How Python forms the foundation for advanced Computer Vision use cases

Python is the preferred programming language for developing computer vision technologies due to its simplicity, flexibility, and extensive libraries.
Artificial intelligence
fromNature
1 week ago

AI agents replicate human social dynamics in days

Moltbook, a social-media platform for AI agents, quickly attracted self-declared rulers and cryptocurrency initiatives after its launch.
Artificial intelligence
fromTechCrunch
1 week ago

From LLMs to hallucinations, here's a simple guide to common AI terms | TechCrunch

A glossary of key artificial intelligence terms is essential for understanding the complex language used in the industry.
Science
fromThe Cipher Brief
1 month ago

Why the U.S. Must Build the Ultimate Multi-Modal Foundation Model

Advanced AI models like AlphaEarth demonstrate pixel-level geospatial intelligence capabilities that must be integrated into U.S. national security frameworks to maintain technological leadership.
Data science
fromTechzine Global
3 weeks ago

As AI hits scaling limits, Google smashes the context barrier

TurboQuant significantly reduces KV cache size, enhancing AI model performance and expanding context windows for complex workloads.
fromGreaterwrong
1 week ago
Artificial intelligence

My picture of the present in AI

AI companies are experiencing significant productivity increases through the integration of advanced AI tools, achieving a speed-up of around 1.6x.
Deliverability
fromFast Company
1 month ago

How to communicate like a human in the age of AI

AI-generated communication lacks personal distinctiveness and authenticity, reducing trustworthiness despite appearing professional, while minimal AI editing preserves human voice and credibility.
Artificial intelligence
fromTheregister
2 weeks ago

Microsoft shivs OpenAI with new AI models for speech, images

Microsoft launched public preview versions of machine learning models for speech recognition, speech synthesis, and image generation, competing directly with OpenAI.
Artificial intelligence
fromFortune
2 weeks ago

Is AI's visual understanding mostly a 'mirage'? New research suggests so. | Fortune

Anthropic faces significant cybersecurity risks following multiple sensitive data leaks related to its new AI model, Mythos.
Roam Research
fromThe Verge
1 month ago

NotebookLM can now summarize research in 'cinematic' video overviews

Google's NotebookLM now generates fully animated cinematic videos from user notes using AI models including Gemini 3, Nano Banana Pro, and Veo 3, advancing beyond previous narrated slideshow capabilities.
Data science
fromInfoQ
1 month ago

Google Researchers Propose Bayesian Teaching Method for Large Language Models

Google researchers developed a training method enabling large language models to approximate Bayesian reasoning by learning from optimal Bayesian system predictions, improving belief updates during multi-step interactions.
fromNature
1 month ago

Merlin: a computed tomography vision-language foundation model and dataset - Nature

The large volume of abdominal computed tomography (CT) scans coupled with the shortage of radiologists have intensified the need for automated medical image analysis tools. Previous state-of-the-art approaches for automated analysis leverage vision-language models (VLMs) that jointly model images and radiology reports.
Medicine
Data science
fromNature
1 month ago

AI can 'same-ify' human expression - can some brains resist its pull?

Large language models are homogenizing human writing styles, reasoning methods, and perspectives, potentially creating widespread sameness in discourse even among non-direct AI users.
#sam-3
Python
fromPyImageSearch
2 months ago

TF-IDF vs. Embeddings: From Keywords to Semantic Search - PyImageSearch

Vector databases and embeddings enable semantic search and retrieval-augmented generation by mapping text meaning into geometric vectors for similarity-based retrieval.
Artificial intelligence
fromFortune
1 month ago

AI mastered language. The physical world is next | Fortune

Embodied AI advancement requires world modeling and physical understanding, constrained by scarcity of specific training data rather than compute or architecture limitations.
fromNature
2 months ago

Multimodal learning with next-token prediction for large multimodal models - Nature

Since AlexNet5, deep learning has replaced heuristic hand-crafted features by unifying feature learning with deep neural networks. Later, Transformers6 and GPT-3 (ref. 1) further advanced sequence learning at scale, unifying structured tasks such as natural language processing. However, multimodal learning, spanning modalities such as images, video and text, has remained fragmented, relying on separate diffusion-based generation or compositional vision-language pipelines with many hand-crafted designs.
Artificial intelligence
fromFast Company
2 months ago

Are LTMs the next LLMs? This new type of AI can do what large-language models can't

A major difference between LLMs and LTMs is the type of data they're able to synthesize and use. LLMs use unstructured data-think text, social media posts, emails, etc. LTMs, on the other hand, can extract information or insights from structured data, which could be contained in tables, for instance. Since many enterprises rely on structured data, often contained in spreadsheets, to run their operations, LTMs could have an immediate use case for many organizations.
Artificial intelligence
fromFortune
1 month ago

We studied chatbots and language and saw a huge problem: They mean 80% when they say 'likely' but humans hear 65% | Fortune

By comparing how AI models and humans map these words to numerical percentages, we uncovered significant gaps between humans and large language models. While the models do tend to agree with humans on extremes like 'impossible,' they diverge sharply on hedge words like 'maybe.' For example, a model might use the word 'likely' to represent an 80% probability, while a human reader assumes it means closer to 65%.
Artificial intelligence
#ai-image-generation
fromInfoQ
2 months ago

Building Embedding Models for Large-Scale Real-World Applications

What happens under the hood? How is the search engine able to take that simple query, look for images in the billions, trillions of images that are available online? How is it able to find this one or similar photos from all that? Usually, there is an embedding model that is doing this work behind the hood.
Artificial intelligence
Artificial intelligence
fromInfoWorld
2 months ago

What is context engineering? And why it's the new AI architecture

Context engineering designs and manages the information, tools, and constraints an LLM receives, enabling scalable, high-signal inputs and improved model outcomes.
Artificial intelligence
fromPsychology Today
1 month ago

An AI Voice Is Not a Mind

AI systems select and perform contextually appropriate personas rather than expressing unified selves with genuine beliefs, creating fluency that mimics mind without possessing interiority or conviction.
Artificial intelligence
fromTechCrunch
2 months ago

Cohere launches a family of open multilingual models | TechCrunch

Cohere launched Tiny Aya open-weight multilingual models supporting 70+ languages, runnable offline on everyday devices with a 3.35B-parameter base and regional variants.
Artificial intelligence
fromInfoQ
2 months ago

Building LLMs in Resource-Constrained Environments: A Hands-On Perspective

Prioritize small, resource-efficient models and iterative, human-in-the-loop data creation to build practical, improvable AI under infrastructure and data constraints.
fromenglish.elpais.com
2 months ago

How does artificial intelligence think? The big surprise is that it intuits'

Each of these achievements would have been a remarkable breakthrough on its own. Solving them all with a single technique is like discovering a master key that unlocks every door at once. Why now? Three pieces converged: algorithms, computing power, and massive amounts of data. We can even put faces to them, because behind each element is a person who took a gamble.
Artificial intelligence
Artificial intelligence
fromMail Online
1 month ago

Can you tell the difference between real and AI-generated people?

People are overconfident in their ability to distinguish AI-generated faces from real ones and perform only slightly better than chance.
Artificial intelligence
fromComputerworld
2 months ago

Researchers propose a self-distillation fix for 'catastrophic forgetting' in LLMs

Continual learning is essential for foundation models; SDFT uses in-context learning to generate on-policy signals, avoiding explicit reward functions and reducing forgetting.
Artificial intelligence
fromInfoQ
2 months ago

MIT's Recursive Language Models Improve Performance on Long-Context Tasks

Recursive Language Models enable LLMs to handle inputs up to 100x longer by using a programming environment and recursive code to decompose and preprocess prompts.
[ Load more ]