#semantic-segmentation

[ follow ]
fromPyImageSearch
1 week ago

Agentic AI Vision System: Object Segmentation with SAM 3 and Qwen - PyImageSearch

Agentic AI systems are designed to interpret user requests, select the appropriate models or tools, evaluate intermediate outputs, and refine their decisions over multiple steps. This iterative reasoning loop enhances the segmentation process significantly.
Python
#ai
fromFuturism
6 days ago
Artificial intelligence

Frontier AI Models Are Doing Something Absolutely Bizarre When Asked to Diagnose Medical X-Rays

Python
fromPycon
1 week ago

Python and the Future of AI: Agents, Inference, and Edge AI

AI tools are increasingly integrated into development, with a dedicated track at PyCon US focusing on their future and practical applications.
fromPsychology Today
1 month ago
Artificial intelligence

AI Spots Brain Disorders in Seconds From Scans

Prima diagnoses over 50 brain disorders from MRI scans in seconds with up to 97.5% accuracy and serves as a foundation model for neuroimaging.
Artificial intelligence
fromFuturism
6 days ago

Frontier AI Models Are Doing Something Absolutely Bizarre When Asked to Diagnose Medical X-Rays

Hallucinations and 'mirage reasoning' in AI models pose significant risks, especially in healthcare applications, leading to potentially dangerous misinformation.
Python
fromPycon
1 week ago

Python and the Future of AI: Agents, Inference, and Edge AI

AI tools are increasingly integrated into development, with a dedicated track at PyCon US focusing on their future and practical applications.
Data science
fromInfoWorld
1 week ago

Why 'curate first, annotate smarter' is reshaping computer vision development

Strategic data selection and curation reduce annotation costs and enhance development productivity in computer vision teams.
Artificial intelligence
fromTheregister
1 week ago

Microsoft shivs OpenAI with new AI models for speech, images

Microsoft launched public preview versions of machine learning models for speech recognition, speech synthesis, and image generation, competing directly with OpenAI.
Artificial intelligence
fromFortune
1 week ago

Is AI's visual understanding mostly a 'mirage'? New research suggests so. | Fortune

Anthropic faces significant cybersecurity risks following multiple sensitive data leaks related to its new AI model, Mythos.
#deepseek-v3
Photography
fromInfoQ
4 weeks ago

Image Processing for Automated Tests

Image-based test automation using AI algorithms enables testing applications without access to internal states like DOM or component trees, providing visual representations to identify intended versus faulty states.
Python
fromBusiness Matters
2 weeks ago

Building AI-powered visual solutions: How Python forms the foundation for advanced Computer Vision use cases

Python is the preferred programming language for developing computer vision technologies due to its simplicity, flexibility, and extensive libraries.
Mobile UX
fromEngadget
1 month ago

Nothing updates its AI app with semantic search and a new way to track events

Nothing's updated Essential Space app now recognizes events from images and supports semantic search, making it easier to organize and find screenshots, voice recordings, and other digital content on 2025 and 2026 Nothing phones.
fromNature
1 month ago

Merlin: a computed tomography vision-language foundation model and dataset - Nature

The large volume of abdominal computed tomography (CT) scans coupled with the shortage of radiologists have intensified the need for automated medical image analysis tools. Previous state-of-the-art approaches for automated analysis leverage vision-language models (VLMs) that jointly model images and radiology reports.
Medicine
Artificial intelligence
fromFortune
1 month ago

AI mastered language. The physical world is next | Fortune

Embodied AI advancement requires world modeling and physical understanding, constrained by scarcity of specific training data rather than compute or architecture limitations.
Python
fromPyImageSearch
1 month ago

DeepSeek-V3 Model: Theory, Config, and Rotary Positional Embeddings - PyImageSearch

DeepSeek-V3 introduces revolutionary architectural innovations including Multihead Latent Attention that reduces KV cache memory by 75% while maintaining model quality, addressing critical challenges in inference efficiency, training cost, and long-range dependency capture.
#circle-to-search
fromYanko Design - Modern Industrial Design News
1 month ago

Nvidia wants robots to learn before executing tasks by watching 44,000 hours of human video - Yanko Design

The robotics industry, for now, faces the biggest challenge in teaching robots to operate in the messy real world. The unstructured environment means robots need massive amounts of data to learn. Gathering and structuring that data is the costliest thing in robotics and perhaps the biggest impediment, slowing the entire development process.
Artificial intelligence
fromBig Think
2 months ago

Computational model discovers new types of neurons hidden in decade-old dataset

There was a group of neurons that predicted the wrong answer, yet they kept getting stronger as the model learned. So we went back to the original macaque data, and the same signal was there, hiding in plain sight. It wasn't a quirk of the model - the monkeys' brains were doing it too. Even as their performance improved, both the real and simulated brains maintained a reserve of neurons that continued to predict the incorrect answer.
Science
fromTechCrunch
2 months ago

Elon Musk teases a new image-labeling system for X...we think? | TechCrunch

So far, the only details on the new feature come from a cryptic X post from Elon Musk saying, "Edited visuals warning," as he reshares an announcement of a new X feature made by the anonymous X account DogeDesigner. That account is often used as a proxy for introducing new X features, as Musk will repost from it to share news.
World news
Python
fromPyImageSearch
1 month ago

SAM 3 for Video: Concept-Aware Segmentation and Object Tracking - PyImageSearch

SAM3 extends beyond static image segmentation to video by maintaining streaming memory and tracking state, enabling unified detection, segmentation, and tracking across frames while preserving object identity over time.
fromFuturism
2 months ago

Scientists Preparing to Simulate Human Brain on Supercomputer

The team, which is being led by Jülich neurophysics professor Markus Diesmann, will leverage the Joint Undertaking Pioneer for Innovative and Transformative Exascale Research (JUPITER) supercomputer for their simulation. JUPITER is currently the fourth most powerful supercomputer in the world according to the TOP500 list, and features thousands of graphical processing units. The team demonstrated last month that a " spiking neural network " could be scaled up and run on JUPITER, effectively matching the cerebral cortex's 20 billion neurons and 100 trillion connections.
Science
#ai-image-generation
#sam-3
fromFortune
1 month ago

We studied chatbots and language and saw a huge problem: They mean 80% when they say 'likely' but humans hear 65% | Fortune

By comparing how AI models and humans map these words to numerical percentages, we uncovered significant gaps between humans and large language models. While the models do tend to agree with humans on extremes like 'impossible,' they diverge sharply on hedge words like 'maybe.' For example, a model might use the word 'likely' to represent an 80% probability, while a human reader assumes it means closer to 65%.
Artificial intelligence
Artificial intelligence
fromHackernoon
2 months ago

Segment Anything in Motion: A Hands-On Guide to sam3-video | HackerNoon

sam3-video is a unified foundation model from Meta Research for prompt-based segmentation that performs segmentation in both images and videos.
Python
fromPyImageSearch
2 months ago

Grounded SAM 2: From Open-Set Detection to Segmentation and Tracking - PyImageSearch

Grounded SAM 2 extends Grounding DINO by adding pixel-level segmentation and video-aware tracking to convert language-driven detections into precise, persistent object masks.
Python
fromPyImageSearch
2 months ago

TF-IDF vs. Embeddings: From Keywords to Semantic Search - PyImageSearch

Vector databases and embeddings enable semantic search and retrieval-augmented generation by mapping text meaning into geometric vectors for similarity-based retrieval.
Artificial intelligence
fromMail Online
1 month ago

Can you tell the difference between real and AI-generated people?

People are overconfident in their ability to distinguish AI-generated faces from real ones and perform only slightly better than chance.
Artificial intelligence
fromInfoWorld
2 months ago

What is context engineering? And why it's the new AI architecture

Context engineering designs and manages the information, tools, and constraints an LLM receives, enabling scalable, high-signal inputs and improved model outcomes.
fromNature
2 months ago

Multimodal learning with next-token prediction for large multimodal models - Nature

Since AlexNet5, deep learning has replaced heuristic hand-crafted features by unifying feature learning with deep neural networks. Later, Transformers6 and GPT-3 (ref. 1) further advanced sequence learning at scale, unifying structured tasks such as natural language processing. However, multimodal learning, spanning modalities such as images, video and text, has remained fragmented, relying on separate diffusion-based generation or compositional vision-language pipelines with many hand-crafted designs.
Artificial intelligence
fromTechCrunch
2 months ago

Neo humanoid maker 1X releases world model to help bots learn what they see | TechCrunch

1X, the robotics company behind the Neo humanoid robot, has unveiled a new AI model that it says understands the dynamics of the real world and can help bots learn new information on their own. This physics-based model, called 1X World Model, uses a combination of video and prompts to give Neo robots new abilities. The video allows Neo robots to learn new tasks they weren't previously trained on, according to 1X.
Artificial intelligence
fromFast Company
1 month ago

This AI-powered machine turns photos into smells

One scientist at MIT, Cyrus Clarke, is working to do just that. Alongside a team of fellow researchers, Clarke has developed a physical machine called the Anemoia Device, which uses a generative AI model to analyze an archival photograph, describe it in a short sentence, and, following the user's own inputs, convert that description into a unique fragrance. The word "anemoia" was coined by author John Koenig and included in his 2021 book, The Dictionary of Obscure Sorrows.
Artificial intelligence
[ Load more ]