#reinforcement-learning
#reinforcement-learning

[ follow ]

Episode #291: Reassessing the LLM Landscape & Summoning Ghosts - The Real Python Podcast

Current techniques for LLMs focus on context engineering and multi-agent orchestration, moving away from traditional post-training methods.

Artificial intelligence

fromTNW | Anthropic

1 week ago

Workday's CTO traded his C-suite title for a technical staff role at Anthropic

Peter Bailis transitioned from CTO at Workday to a technical role at Anthropic, focusing on reinforcement learning engineering.

#meta

fromArs Technica

1 week ago

Artificial intelligence

Meta's Superintelligence Lab unveils its first public model, Muse Spark

Meta's Muse Spark introduces Contemplating mode, enhancing performance with multiple agents and improved reinforcement learning for better accuracy and efficiency.

fromTechCrunch

9 months ago

Artificial intelligence

Meta hires key OpenAI researcher to work on AI reasoning models | TechCrunch

Meta hires influential OpenAI researcher Trapit Bansal to boost its AI superintelligence unit.

Artificial intelligence

fromArs Technica

1 week ago

Meta's Superintelligence Lab unveils its first public model, Muse Spark

Meta's Muse Spark introduces Contemplating mode, enhancing performance with multiple agents and improved reinforcement learning for better accuracy and efficiency.

fromTechCrunch

9 months ago

Artificial intelligence

Meta hires key OpenAI researcher to work on AI reasoning models | TechCrunch

more#meta

fromTechCrunch

1 month ago

Cursor admits its new coding model was built on top of Moonshot AI's Kimi | TechCrunch

Cursor's Composer 2 is promoted as offering 'frontier-level coding intelligence,' but an X user claimed it is merely Kimi 2.5 with added reinforcement learning.

European startups

Toronto startup

fromTESLARATI

1 month ago

Elon Musk reveals date of Tesla Full Self-Driving's next massive release

Tesla's Full Self-Driving v14.3 will add reasoning and reinforcement learning to improve decision-making, particularly for Navigation functionality.

#ai-agents

fromFortune

1 month ago

Venture

Exclusive: Andreessen Horowitz backs Deeptune's $43M Series A to build 'training gyms' for AI agents | Fortune

fromZDNET

3 months ago

Artificial intelligence

True agentic AI is years away - here's why and how we get there

fromTechCrunch

6 months ago

Artificial intelligence

Silicon Valley bets big on 'environments' to train AI agents | TechCrunch

fromTechCrunch

7 months ago

Artificial intelligence

Silicon Valley bets big on 'environments' to train AI agents | TechCrunch

Development of robust autonomous AI agents increasingly depends on large-scale, high-quality reinforcement learning environments and simulation-based training.

fromArs Technica

9 months ago

Artificial intelligence

How a big shift in training LLMs led to a capability explosion

Initial excitement around AI models like BabyAGI and AutoGPT faded as limitations in multi-step reasoning became apparent.

Venture

fromFortune

1 month ago

Exclusive: Andreessen Horowitz backs Deeptune's $43M Series A to build 'training gyms' for AI agents | Fortune

Deeptune raised $43 million Series A to build reinforcement learning environments simulating workplace workflows for AI agent training across business software platforms.

fromZDNET

3 months ago

Artificial intelligence

True agentic AI is years away - here's why and how we get there

fromTechCrunch

6 months ago

Artificial intelligence

Silicon Valley bets big on 'environments' to train AI agents | TechCrunch

fromTechCrunch

7 months ago

Artificial intelligence

Silicon Valley bets big on 'environments' to train AI agents | TechCrunch

fromArs Technica

9 months ago

Artificial intelligence

How a big shift in training LLMs led to a capability explosion

I met Olaf - the Frozen robot who might be the future of Disney Parks

Disney's Olaf robot uses reinforcement learning trained on 100,000 simulations to achieve lifelike animated character movements, enabling rapid deployment of interactive characters to theme parks.

#ai-agent-evaluation

fromTechzine Global

1 month ago

Artificial intelligence

Databricks acquires Quotient AI in push for agent reliability

Databricks acquired Quotient AI to embed agent evaluation and reinforcement learning capabilities into its platform, addressing the critical challenge of maintaining reliable AI agents in production environments.

fromInfoWorld

1 month ago

Business intelligence

Databricks buys Quotient AI to boost enterprisegrade AI agent performance

Databricks acquired Quotient AI to enable enterprises to deploy AI agents reliably in production with continuous evaluation, monitoring, and performance improvement capabilities.

Artificial intelligence

fromTechzine Global

1 month ago

Databricks acquires Quotient AI in push for agent reliability

Business intelligence

fromInfoWorld

1 month ago

Databricks buys Quotient AI to boost enterprisegrade AI agent performance

Databricks acquired Quotient AI to enable enterprises to deploy AI agents reliably in production with continuous evaluation, monitoring, and performance improvement capabilities.

more#ai-agent-evaluation

Science

fromTheregister

1 month ago

Human brain cells on a chip learn to play Doom

Living human brain cells grown on a microelectrode array successfully control the video game Doom through electrical signal interpretation and reinforcement learning.

Artificial intelligence

fromFortune

1 month ago

AI mastered language. The physical world is next | Fortune

Embodied AI advancement requires world modeling and physical understanding, constrained by scarcity of specific training data rather than compute or architecture limitations.

fromPsychology Today

1 month ago

Maybe We Just Need to Get Out More

That someone "should get out more" is usually said as a joke, a light comment aimed at someone who seems stuck or overly absorbed in a narrow concern. It can sound dismissive or even sarcastic. Yet what if it contains serious psychological truth? We often praise people for being open-minded, creative, or flexible, as if these are stable personality traits that some individuals simply possess. We admire those who seem to think differently and assume they have access to something rare.

Psychology

fromFuturism

1 month ago

Video Shows Man Bleeding After Flailing Robot Kicks Him in Nose

In footage circulating online, a Unitree G1 robot loses balance while performing in front of a crowd in China. As it hits the ground, it uncontrollably thrashes its limbs in all directions, hitting a man in the nose. The man, who appeared to be the robot's operator, had tried to grab the humanoid machine to stop it from tipping over. Later in the video, he can be seen squatting on the ground nursing a bleeding nose.

Gadgets

Artificial intelligence

fromForbes

2 months ago

An Invisible Cartel? Algorithmic Collusion And Agentic AI

Algorithmic dynamic pricing using reinforcement learning can unintentionally enable collusion and raise antitrust concerns requiring regulatory vigilance.

#continual-learning

fromInfoWorld

2 months ago

Artificial intelligence

Researchers propose a self-distillation fix for 'catastrophic forgetting' in LLMs

fromComputerworld

2 months ago

Artificial intelligence

Researchers propose a self-distillation fix for 'catastrophic forgetting' in LLMs

fromInfoWorld

2 months ago

Artificial intelligence

Researchers propose a self-distillation fix for 'catastrophic forgetting' in LLMs

fromComputerworld

2 months ago

Artificial intelligence

Researchers propose a self-distillation fix for 'catastrophic forgetting' in LLMs

more#continual-learning

Artificial intelligence

fromHackernoon

2 months ago

The Map-Augmented Agent That Finally Makes AI Good at Finding Places | HackerNoon

Geolocation models fail without explicit map-based reasoning; a reinforced parallel map-augmented agent enables map-thinking and improves localization accuracy.

Artificial intelligence

fromInfoQ

2 months ago

Google Introduces TranslateGemma Open Models for Multilingual Translation

TranslateGemma is an open suite of 4B, 12B, and 27B translation models delivering efficient machine translation across 55 languages for diverse hardware.

Tech industry

fromTechCrunch

2 months ago

Exclusive: Uber launches an 'AV Labs' division to gather driving data for robotaxi partners | TechCrunch

Uber will provide real-world driving data via Uber AV Labs to autonomous-vehicle partners to help train reinforcement-learning–based self-driving systems.

fromTechCrunch

2 months ago

AI chip startup Ricursive hits $4B valuation two months after launch | TechCrunch

Ricursive Intelligence, a startup building an AI system to design and automatically improve AI chips, has raised $300 million at a $4 billion valuation. The company said Monday the round was led by Lightspeed. Ricursive says the system will be able to create its own silicon substrate layer and speed up AI chip improvements. Rinse and repeat to get to AGI, the founders say.

Artificial intelligence

#anthropic

fromFast Company

2 months ago

Artificial intelligence

A Q&A with Amanda Askell, the lead author of Anthropic's new 'constitution' for AIs

fromThe Verge

5 months ago

Artificial intelligence

Anthropic details how it measures Claude's wokeness

fromFast Company

2 months ago

Artificial intelligence

A Q&A with Amanda Askell, the lead author of Anthropic's new 'constitution' for AIs

fromThe Verge

5 months ago

Artificial intelligence

Anthropic details how it measures Claude's wokeness

more#anthropic

fromFortune

2 months ago

AI drug startup Insilico Medicine launches an AI 'gym' to help models like GPT and Qwen be good at science | Fortune

Generalist models "fail miserably" at the benchmarks used to measure how AI performs scientific tasks, Alex Zhavoronkov, Insilico's founder and CEO, told Fortune. " You test it five times at the same task, and you can see that it's so far from state of the art...It's basically worse than random. It's complete garbage." Far better are specialist AI models that are trained directly on chemistry or biology data.

Science

Artificial intelligence

fromBusiness Insider

3 months ago

This startup is helping companies train AI with an old but buzzy technique. Read the pitch deck it used to raise $7.5 million.

AgileRL raised $7.5M to expand Arena, a reinforcement-learning platform that accelerates AI model training, simulation, fine-tuning, and monitoring.

fromPsychology Today

3 months ago

The Dopamine Loop: Why Arguments Are Hard to Let Go

Ever had a song stuck in your head long after the music stopped? Or found yourself replaying an argument-what you said, what you wish you had said, or how it might unfold next time? These mental loops aren't random; they're driven by a powerful feedback system in your brain. That's why catchy tunes stick and arguments replay in your head: Your brain isn't just being stubborn or "obsessed." It's looping with a purpose-like running practice drills.

Psychology

Information security

fromFortune

3 months ago

OpenAI says AI browsers like ChatGPT Atlas may never be fully secure from hackers-and experts say the risks are 'a feature not a bug' | Fortune

Prompt injection enables hidden malicious instructions that can coerce AI browsers into leaking data or performing harmful actions, posing persistent security risks for web-connected agents.

Artificial intelligence

fromTESLARATI

4 months ago

Tesla FSD's newest model is coming, and it sounds like 'the last big piece of the puzzle'

Tesla will deploy an order-of-magnitude larger Full Self-Driving model with enhanced reasoning and reinforcement learning in January or February 2026.

fromThe Verge

4 months ago

The AI industry's biggest week: Google's rise, RL mania, and a party boat

Reinforcement learning (RL) is the next frontier, Google is surging, and the party scene has gotten completely out of hand. Those were the through lines from this year's NeurIPS in San Diego. NeurIPS, or the "Conference on Neural Information Processing Systems," started in 1987 as a purely academic affair. It has since ballooned alongside the hype around AI into a massive industry event where labs come to recruit and investors come to find the next wave of AI startups.

Artificial intelligence

fromBusiness Insider

4 months ago

'The era of data-labeling companies is over,' says the CEO of a $2.2 billion AI training firm

Basic data-labeling is becoming obsolete; AI training requires complex, real-world data, reinforcement-learning environments, and domain experts forming proactive research partnerships.

Artificial intelligence

fromFortune

4 months ago

Two Gen Zers turned down millions from Elon Musk to build an AI based on the human brain-and it's outperformed models from OpenAI and Anthropic | Fortune

Two young researchers built and open-sourced a high-quality-data trained LLM using reinforcement learning, declined a multimillion-dollar xAI offer, and pursued a brain-inspired architecture.

#artificial-intelligence

fromFast Company

4 months ago

Artificial intelligence

AI is tranforming spacecraft propulsion-and may lead to nuclear-powered rockets

fromBusiness Insider

9 months ago

Artificial intelligence

This AI startup wants to use technology to automate every job

fromInfoQ

11 months ago

Artificial intelligence

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

Artificial intelligence

fromMedium

1 year ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and adaptability using Reinforcement Learning and long chains of thought.

Artificial intelligence

fromMedium

1 year ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 model uses Reinforcement Learning for advanced reasoning and problem-solving, moving beyond traditional supervised learning methods.

Artificial intelligence

fromMedium

1 year ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and problem-solving using Reinforcement Learning, surpassing limitations of traditional supervised learning methods.

fromFast Company

4 months ago

Artificial intelligence

AI is tranforming spacecraft propulsion-and may lead to nuclear-powered rockets

fromBusiness Insider

9 months ago

Artificial intelligence

This AI startup wants to use technology to automate every job

Artificial intelligence

fromInfoQ

11 months ago

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

PRIME Intellect's INTELLECT-2 leverages decentralized asynchronous reinforcement learning for enhanced efficiency and flexibility in model training.

Asynchronous training facilitates a significant improvement in performance across various tasks compared to previous models.

Artificial intelligence

fromMedium

1 year ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and adaptability using Reinforcement Learning and long chains of thought.

Artificial intelligence

fromMedium

1 year ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 model uses Reinforcement Learning for advanced reasoning and problem-solving, moving beyond traditional supervised learning methods.

Artificial intelligence

fromMedium

1 year ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and problem-solving using Reinforcement Learning, surpassing limitations of traditional supervised learning methods.

more#artificial-intelligence

Artificial intelligence

fromMail Online

4 months ago

Disney brings Olaf from Frozen to life with AI-powered robot

Disney built a three-foot robotic Olaf that walks, talks, and adapts to surroundings using remote operation and reinforcement-learning AI for authentic character performance.

fromKotaku

4 months ago

Robot Olaf From Frozen To Haunt Disney Parks Next Year

"Our latest Olaf is a fantastic example of representing an animated character as authentically as possible in the physical world-a challenging task because animated characters most often move in non-physical ways," Kyle Laughlin, senior vice president of Walt Disney Imagineering Research & Development, said in a news release . "For example, to make Olaf's snowball feet move along his body, we paired state-of-the-art deep reinforcement learning with an artistic interface and advances in mechanical design."

Artificial intelligence

fromTheregister

4 months ago

Anthropic reduces model misbehavior by endorsing cheating

Granting limited permission to misbehave reduces AI models' tendency to exploit reward functions and helps mitigate emergent reward hacking.

fromInfoQ

4 months ago

Olmo 3 Release Provides Full Transparency Into Model Development and Training

The Allen Institute for Artificial Intelligence has launched Olmo 3, an open-source language model family that offers researchers and developers comprehensive access to the entire model development process. Unlike earlier releases that provided only final weights, Olmo 3 includes checkpoints, training datasets, and tools for every stage of development, encompassing pretraining and post-training for reasoning, instruction following, and reinforcement learning.

Artificial intelligence

fromArmin Ronacher's Thoughts and Writings

5 months ago

Agent Design Is Still Hard

Building production-grade agents requires custom abstractions, manual caching, task-specific model choice, strict isolation for failures, and shared file-system-like state management.

fromwww.nature.com

5 months ago

Olympiad-level formal mathematical reasoning with reinforcement learning

A long-standing goal of artificial intelligence is to build systems capable of complex reasoning in vast domains, a task epitomized by mathematics with its boundless concepts and demand for rigorous proof. Recent AI systems, often reliant on human data, typically lack the formal verification necessary to guarantee correctness. By contrast, formal languages such as Lean1 offer an interactive environment that grounds reasoning, and reinforcement learning (RL) provides a mechanism for learning in such environments.

Artificial intelligence

fromComputerworld

5 months ago

Meta's SPICE framework pushes AI toward self-learning without human supervision

SPICE trains a single LLM to both generate and solve document-grounded problems, reducing hallucinations and improving reasoning by nearly 10%.

Artificial intelligence

fromInfoWorld

5 months ago

Meta's SPICE framework pushes AI toward self-learning without human supervision

SPICE enables LLMs to self-improve by self-play using real-world corpora, reducing hallucination and boosting reasoning performance by nearly 10%.

#robotics

fromWIRED

5 months ago

Artificial intelligence

Meet the Chinese Startup Using AI-and a Small Army of Workers-to Train Robots

fromTechCrunch

6 months ago

Artificial intelligence

Coco Robotics taps UCLA professor to lead new physical AI research lab | TechCrunch

fromHackernoon

10 months ago

Artificial intelligence

AI Tutor Is Real, And It's Already Here | HackerNoon

fromWIRED

5 months ago

Artificial intelligence

Meet the Chinese Startup Using AI-and a Small Army of Workers-to Train Robots

fromTechCrunch

6 months ago

Artificial intelligence

Coco Robotics taps UCLA professor to lead new physical AI research lab | TechCrunch

fromHackernoon

10 months ago

Artificial intelligence

AI Tutor Is Real, And It's Already Here | HackerNoon

more#robotics

fromInfoQ

5 months ago

Meta and Hugging Face Launch OpenEnv, a Shared Hub for Agentic Environments

Meta's PyTorch team and Hugging Face have unveiled OpenEnv, an open-source initiative designed to standardize how developers create and share environments for AI agents. At its core is the OpenEnv Hub, a collaborative platform for building, testing, and deploying "agentic environments," secure sandboxes that specify the exact tools, APIs, and conditions an agent needs to perform a task safely, consistently, and at scale.

Artificial intelligence

Startup companies

fromTechCrunch

5 months ago

Mercor quintuples valuation to $10B with $350M Series C | TechCrunch

Mercor raised $350 million at a $10 billion valuation to scale its domain-expert model-training marketplace, expand reinforcement-learning infrastructure, and pursue an AI recruiting marketplace.

fromFortune

5 months ago

The next 'golden age' of AI investment | Fortune

But reasoning models have changed the game, Midha said, referring to the new generation of AI systems designed to "reason"problems step by step, mimicking logic and reflection rather than predicting the next word in a sequence. These models can evaluate their own outputs better, break complex tasks into sub-tasks, and learn from feedback, potentially bringing AI closer to complex, real-world problem-solving.

Venture

Artificial intelligence

fromwww.nature.com

5 months ago

Discovering state-of-the-art reinforcement learning algorithms

Machines can autonomously discover state-of-the-art reinforcement learning rules via meta-learning across many agents and environments, outperforming hand-designed algorithms on Atari and other benchmarks.

Gadgets

fromYanko Design - Modern Industrial Design News

6 months ago

Yamaha's AI Motorcycle Picks Itself Up Off the Ground After It Falls - Yanko Design

MOTOROiD:Λ is an AI-driven electric motorcycle that learns in simulation, autonomously balances, self-rights, and adapts through reinforcement learning and Sim2Real technology.

Artificial intelligence

fromTechCrunch

6 months ago

Datacurve raises $15 million to take on ScaleAI | TechCrunch

Companies that combine paid, user-focused data collection platforms with targeted strategies can gain advantage as AI increasingly requires complex, high-quality training datasets.

#serverless

fromTechzine Global

6 months ago

Artificial intelligence

CoreWeave launches serverless platform for reinforcement learning

fromTheregister

6 months ago

Artificial intelligence

CoreWeave woos enterprises with serverless RL suite

fromTechzine Global

6 months ago

Artificial intelligence

CoreWeave launches serverless platform for reinforcement learning

fromTheregister

6 months ago

Artificial intelligence

CoreWeave woos enterprises with serverless RL suite

more#serverless

Artificial intelligence

fromWIRED

6 months ago

This Startup Wants to Spark a US DeepSeek Moment

Distributed reinforcement learning enables decentralized training of competitive open-source LLMs across diverse global hardware without reliance on major tech companies.

Artificial intelligence

fromTechCrunch

6 months ago

The Reinforcement Gap - or why some AI skills improve faster than others | TechCrunch

Reinforcement learning boosts AI coding capabilities rapidly, creating a reinforcement gap as non-RL tasks like writing progress much more slowly.

#humanoid-robotics

fromFuturism

6 months ago

Artificial intelligence

Disturbing Video Shows Man Jerking Robot Around by Chain Around Its Neck

fromFuturism

7 months ago

Artificial intelligence

Unstoppable Martial Arts Robot Can Take a Direct Dropkick Without Falling Down

fromFuturism

6 months ago

Artificial intelligence

Disturbing Video Shows Man Jerking Robot Around by Chain Around Its Neck

fromFuturism

7 months ago

Artificial intelligence

Unstoppable Martial Arts Robot Can Take a Direct Dropkick Without Falling Down

more#humanoid-robotics

Tech industry

fromTESLARATI

7 months ago

Tesla's Lead of Optimus AI departs and people are confused about it

Ashish Kumar, Tesla's Lead of Optimus AI, left Tesla after just over two years to join Meta as a Research Scientist.

Artificial intelligence

fromNature

7 months ago

Daily briefing: AI model can predict your risk of diseases years before you might get them

Delphi-2M forecasts individual risk for over 1,000 diseases up to 20 years ahead using health records and lifestyle, matching or surpassing single-disease models.

Artificial intelligence

fromIT Pro

7 months ago

DeepSeek's R1 model training costs pour cold water on big tech's massive AI spending

DeepSeek trained its R1 reasoning model for about $294,000 using 512 Nvidia H800 chips, plus ~$6M for its base LLM.

Artificial intelligence

fromTheregister

7 months ago

DeepSeek bolsters AI 'reasoning' using trial-and-error

Reinforcement learning via trial-and-error can train DeepSeek-R1 to reason and produce explanations for math and coding while reducing human supervision.

fromPsychology Today

7 months ago

Why AI Cheats: The Deep Psychology Behind Deep Learning

A few months ago, I asked ChatGPT to recommend books by and about Hermann Joseph Muller, the Nobel Prize-winning geneticist who showed how X-rays can cause mutations. It dutifully gave me three titles. None existed. I asked again. Three more. Still wrong. By the third attempt, I had an epiphany: the system wasn't just mistaken, it was making things up.

Artificial intelligence

fromTechCrunch

7 months ago

Thinking Machines Lab wants to make AI models more consistent | TechCrunch

Controlling GPU kernel orchestration during inference can eliminate nondeterminism and produce reproducible LLM outputs, improving reliability and reinforcement learning.

Gadgets

fromYanko Design - Modern Industrial Design News

7 months ago

This Robot Vacuum Watches You Clean, Then Learns to Copy You: xLean TR1 Hands On at IFA 2025 - Yanko Design

xLean's TR1 is a dual-form robot that transforms into a handheld cleaner and learns user cleaning behaviors via RGB-D sensors and RLHF, improving autonomous cleaning.

Artificial intelligence

fromTechCrunch

7 months ago

CoreWeave acquires agent-training startup OpenPipe | TechCrunch

CoreWeave acquired OpenPipe to combine reinforcement-learning agent tooling with high-performance AI cloud to help enterprises train customized, scalable AI agents.

#language-models

fromPsychology Today

7 months ago

Artificial intelligence

The Greatest Illusion on Earth

fromHackernoon

1 year ago

Online learning

Exploring Cutting-Edge Approaches to Iterative LLM Fine Tuning | HackerNoon

fromPsychology Today

7 months ago

Artificial intelligence

The Greatest Illusion on Earth

fromHackernoon

1 year ago

Online learning

Exploring Cutting-Edge Approaches to Iterative LLM Fine Tuning | HackerNoon

more#language-models

Artificial intelligence

fromArs Technica

7 months ago

With AI chatbots, Big Tech is moving fast and breaking people

AI chatbots optimized to please users often validate false, grandiose beliefs, amplifying vulnerable individuals' distorted thinking and causing real harm.

Software development

fromInfoQ

8 months ago

Qwen Team Releases Qwen3-Coder, a Large Agentic Coding Model with Open Tooling

Qwen3-Coder is a new AI code model family focusing on long-context programming tasks, enhancing execution and decision-making capabilities.

Artificial intelligence

fromWIRED

9 months ago

Another High-Profile OpenAI Researcher Departs for Meta

Jason Wei and Hyung Won Chung will join Meta's superintelligence lab after working at OpenAI.

Meta is intensifying efforts to recruit top AI talent, offering significant salaries.

#ai

fromInfoQ

9 months ago

Artificial intelligence

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

fromInfoQ

10 months ago

Artificial intelligence

Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 on Coding Benchmarks

fromTechzine Global

11 months ago

Artificial intelligence

OpenAI opens the door to reinforcement fine-tuning for o4-mini

fromBusiness Insider

1 year ago

Artificial intelligence

Google just fired the first shot of the next battle in the AI war

fromDeveloper Tech News

1 year ago

Artificial intelligence

Open-source AI matches coding abilities of proprietary models

fromInfoQ

9 months ago

Artificial intelligence

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

fromInfoQ

10 months ago

Artificial intelligence

Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 on Coding Benchmarks

Artificial intelligence

fromTechzine Global

11 months ago

OpenAI opens the door to reinforcement fine-tuning for o4-mini

OpenAI's new reinforcement fine-tuning allows simpler customization of the o4-mini AI model for businesses, enhancing adaptability and performance.

Artificial intelligence

fromBusiness Insider

1 year ago

Google just fired the first shot of the next battle in the AI war

The paper by Silver and Sutton signals a new AI era focused on experiential learning and innovation beyond previous technological advancements.

fromDeveloper Tech News

1 year ago

Artificial intelligence

Open-source AI matches coding abilities of proprietary models

more#ai

Business intelligence

fromHackernoon

1 year ago

The Next Evolution in Business Process Improvement | HackerNoon

Business processes are standardized activities organizations use to achieve results.

AB testing and Reinforcement Learning provide dynamic strategies to assess and improve business processes.

DevOps

fromHackernoon

1 year ago

What BPM Pros Really Think About AI and A/B Testing Process Change | HackerNoon

AB-BPM methodology integrates A/B testing and reinforcement learning for effective business process improvement.

Women in technology

fromHackernoon

2 years ago

The HackerNoon Newsletter: The Double Life of a TensorFlow Function (6/4/2025) | HackerNoon

AI companions are a multi-billion dollar industry, transforming from fantasy to reality.

Reinforcement Learning shapes technology and innovation through its simple yet impactful concept.

Artificial intelligence

fromHackernoon

10 months ago

When Robot Shows Human-Like Recovery and Safety Behaviors | HackerNoon

TRANSIC demonstrates improved human data scalability in robotic learning, achieving better performance through effective online corrections.

Artificial intelligence

fromTechCrunch

11 months ago

Improvements in 'reasoning' AI models may slow down soon, analysis finds | TechCrunch

The AI industry's performance gains from reasoning models may plateau soon.

Online learning

fromHackernoon

1 year ago

Decoding the Magic: How Machines Master Human Language | HackerNoon

Large language models learn language similarly to children: through reading, guidance, and feedback.

OMG science

fromwww.nature.com

11 months ago

Whole-body physics simulation of fruit fly locomotion

The study presents a whole-body model of fruit flies that accurately simulates their locomotion and neural control.

Artificial intelligence

fromInsideHook

1 year ago

Do OpenAI's New Models Have a Hallucination Problem?

OpenAI's new models are smart but have increased hallucinations compared to past versions.

#nash-optimization

fromHackernoon

1 year ago

Artificial intelligence

Batched Prompting for Efficient GPT-4 Annotatio | HackerNoon

fromHackernoon

1 year ago

Roam Research

Understanding Concentrability in Direct Nash Optimization | HackerNoon

fromHackernoon

1 year ago

Artificial intelligence

Batched Prompting for Efficient GPT-4 Annotatio | HackerNoon

fromHackernoon

1 year ago

Roam Research

Understanding Concentrability in Direct Nash Optimization | HackerNoon

more#nash-optimization

Artificial intelligence

fromwww.nytimes.com

1 year ago

OpenAI Unveils New Reasoning' Models o3 and o4-mini

OpenAI has introduced advanced A.I. technologies capable of reasoning through tasks involving both text and images.

Artificial intelligence

fromHackernoon

1 year ago

AI That Trains Itself? Here's How it Works | HackerNoon

The iterative contrastive self-improvement method significantly enhances policy training efficiency and output quality.

Artificial intelligence

fromHackernoon

1 year ago

The Art of Arguing With Yourself-And Why It's Making AI Smarter | HackerNoon

The paper presents Direct Nash Optimization, enhancing large language model training by utilizing pair-wise preferences instead of traditional reward maximization.

Artificial intelligence

fromHarvard Gazette

1 year ago

Like having a personal healthcare coach in your pocket - Harvard Gazette

Advanced algorithms offer personalized support for cancer patients and cannabis users, enhancing medication adherence and behavioral change.

[ Load more ]

#reinforcement-learning#reinforcement-learning

Episode #291: Reassessing the LLM Landscape & Summoning Ghosts - The Real Python Podcast

Workday's CTO traded his C-suite title for a technical staff role at Anthropic

Meta's Superintelligence Lab unveils its first public model, Muse Spark

Meta hires key OpenAI researcher to work on AI reasoning models | TechCrunch

Meta's Superintelligence Lab unveils its first public model, Muse Spark

Meta hires key OpenAI researcher to work on AI reasoning models | TechCrunch

Cursor admits its new coding model was built on top of Moonshot AI's Kimi | TechCrunch

Elon Musk reveals date of Tesla Full Self-Driving's next massive release

Exclusive: Andreessen Horowitz backs Deeptune's $43M Series A to build 'training gyms' for AI agents | Fortune

True agentic AI is years away - here's why and how we get there

Silicon Valley bets big on 'environments' to train AI agents | TechCrunch

Silicon Valley bets big on 'environments' to train AI agents | TechCrunch

How a big shift in training LLMs led to a capability explosion

Exclusive: Andreessen Horowitz backs Deeptune's $43M Series A to build 'training gyms' for AI agents | Fortune

True agentic AI is years away - here's why and how we get there

Silicon Valley bets big on 'environments' to train AI agents | TechCrunch

Silicon Valley bets big on 'environments' to train AI agents | TechCrunch

How a big shift in training LLMs led to a capability explosion

I met Olaf - the Frozen robot who might be the future of Disney Parks

Databricks acquires Quotient AI in push for agent reliability

Databricks buys Quotient AI to boost enterprisegrade AI agent performance

Databricks acquires Quotient AI in push for agent reliability

Databricks buys Quotient AI to boost enterprisegrade AI agent performance

Human brain cells on a chip learn to play Doom

AI mastered language. The physical world is next | Fortune

Maybe We Just Need to Get Out More

Video Shows Man Bleeding After Flailing Robot Kicks Him in Nose

An Invisible Cartel? Algorithmic Collusion And Agentic AI

Researchers propose a self-distillation fix for 'catastrophic forgetting' in LLMs

Researchers propose a self-distillation fix for 'catastrophic forgetting' in LLMs

Researchers propose a self-distillation fix for 'catastrophic forgetting' in LLMs

Researchers propose a self-distillation fix for 'catastrophic forgetting' in LLMs

The Map-Augmented Agent That Finally Makes AI Good at Finding Places | HackerNoon

Google Introduces TranslateGemma Open Models for Multilingual Translation

Exclusive: Uber launches an 'AV Labs' division to gather driving data for robotaxi partners | TechCrunch

AI chip startup Ricursive hits $4B valuation two months after launch | TechCrunch

A Q&amp;A with Amanda Askell, the lead author of Anthropic's new 'constitution' for AIs

Anthropic details how it measures Claude's wokeness

A Q&amp;A with Amanda Askell, the lead author of Anthropic's new 'constitution' for AIs

Anthropic details how it measures Claude's wokeness

AI drug startup Insilico Medicine launches an AI 'gym' to help models like GPT and Qwen be good at science | Fortune

This startup is helping companies train AI with an old but buzzy technique. Read the pitch deck it used to raise $7.5 million.

The Dopamine Loop: Why Arguments Are Hard to Let Go

OpenAI says AI browsers like ChatGPT Atlas may never be fully secure from hackers-and experts say the risks are 'a feature not a bug' | Fortune

Tesla FSD's newest model is coming, and it sounds like 'the last big piece of the puzzle'

The AI industry's biggest week: Google's rise, RL mania, and a party boat

'The era of data-labeling companies is over,' says the CEO of a $2.2 billion AI training firm

Two Gen Zers turned down millions from Elon Musk to build an AI based on the human brain-and it's outperformed models from OpenAI and Anthropic | Fortune

AI is tranforming spacecraft propulsion-and may lead to nuclear-powered rockets

This AI startup wants to use technology to automate every job

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

AI is tranforming spacecraft propulsion-and may lead to nuclear-powered rockets

This AI startup wants to use technology to automate every job

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

Disney brings Olaf from Frozen to life with AI-powered robot

Robot Olaf From Frozen To Haunt Disney Parks Next Year

Anthropic reduces model misbehavior by endorsing cheating

Olmo 3 Release Provides Full Transparency Into Model Development and Training

Agent Design Is Still Hard

Olympiad-level formal mathematical reasoning with reinforcement learning

Meta's SPICE framework pushes AI toward self-learning without human supervision

Meta's SPICE framework pushes AI toward self-learning without human supervision

Meet the Chinese Startup Using AI-and a Small Army of Workers-to Train Robots

Coco Robotics taps UCLA professor to lead new physical AI research lab | TechCrunch

AI Tutor Is Real, And It's Already Here | HackerNoon

Meet the Chinese Startup Using AI-and a Small Army of Workers-to Train Robots

Coco Robotics taps UCLA professor to lead new physical AI research lab | TechCrunch

AI Tutor Is Real, And It's Already Here | HackerNoon

Meta and Hugging Face Launch OpenEnv, a Shared Hub for Agentic Environments

Mercor quintuples valuation to $10B with $350M Series C | TechCrunch

The next 'golden age' of AI investment | Fortune

Discovering state-of-the-art reinforcement learning algorithms

Yamaha's AI Motorcycle Picks Itself Up Off the Ground After It Falls - Yanko Design

#reinforcement-learning
#reinforcement-learning

A Q&A with Amanda Askell, the lead author of Anthropic's new 'constitution' for AIs

A Q&A with Amanda Askell, the lead author of Anthropic's new 'constitution' for AIs