#gpu-inference

[ follow ]
Data science
fromInfoQ
2 days ago

Google's TurboQuant Compression May Support Faster Inference, Same Accuracy on Less Capable Hardware

TurboQuant compresses language models' Key-Value caches by up to 6x with near-zero accuracy loss, enabling efficient use of modest hardware.
#nvidia
Vue
fromGadgets 360
3 hours ago

GeForce Now Explained: What Is It, Features, Subscription Plans and More

Nvidia GeForce Now launches in India, enabling cloud gaming without high-end hardware through streaming from powerful remote servers.
Artificial intelligence
fromComputerworld
3 days ago

Nvidia's Stephen Jones on the toolkit powering GPUs: 'A wild ride'

Nvidia's CUDA toolkit is foundational for AI advancements and is driving innovations in quantum computing, robotics, and autonomous vehicles.
Tech industry
from24/7 Wall St.
1 day ago

Why I Can't Stop Buying Nvidia Stock

NVIDIA's growth trajectory continues to accelerate, with significant revenue and net income increases, indicating strong market positioning and demand.
Tech industry
from24/7 Wall St.
2 days ago

NVIDIA Rises Even as Quantum Computing Threat Looms and Insider Selling Sparks Debate

NVIDIA shares rose 3% today, driven by the launch of quantum AI software and an expanded partnership with IBM.
Video games
fromGadgets 360
2 weeks ago

Nvidia Brings New AI Features With a New DLSS 4.5 Update

Nvidia's DLSS 4.5 update introduces 6X multi-frame generation and dynamic multi-frame generation for enhanced gaming performance.
Vue
fromThe Verge
2 weeks ago

Nvidia rolls out DLSS 4.5 update with new frame generation features

Nvidia's DLSS 4.5 update introduces AI-powered frame generation for RTX GPUs, enhancing performance and image quality in over 20 games.
Business
from24/7 Wall St.
1 day ago

AMD Gains 6% Ahead of May Earnings: Is the AI Chip Challenger Finally Ready to Rival NVIDIA?

AMD stock rises 6% due to catalysts in AI chip development and partnerships, signaling growing investor confidence.
#ai-agents
fromEngadget
1 month ago
Artificial intelligence

NVIDIA is reportedly working on its own open-source AI agent platform

fromWIRED
1 month ago
Artificial intelligence

Nvidia Is Planning to Launch an Open-Source AI Agent Platform

Software development
fromTechzine Global
1 day ago

OpenAI's new Agents SDK focuses on safety and scalability

OpenAI's updated Agents SDK enables developers to create autonomous AI agents for complex tasks with enhanced usability and a sandbox environment.
Artificial intelligence
fromEngadget
1 month ago

NVIDIA is reportedly working on its own open-source AI agent platform

NVIDIA is developing NemoClaw, an enterprise-focused open-source AI agent platform designed to work across non-NVIDIA hardware with enhanced security features.
Artificial intelligence
fromWIRED
1 month ago

Nvidia Is Planning to Launch an Open-Source AI Agent Platform

Nvidia is launching NemoClaw, an open-source AI agent platform enabling enterprise software companies to deploy AI agents for workforce task automation, accessible regardless of chip dependency.
Venture
fromTechCrunch
6 days ago

Nvidia-backed SiFive hits $3.65 billion valuation for open AI chips | TechCrunch

SiFive raised $400 million, valuing the company at $3.65 billion, focusing on RISC-V open chip designs for AI data centers.
#ai
from24/7 Wall St.
1 day ago
Artificial intelligence

AI Compute Demand is Running Way Ahead of Supply - A Stock I'd Buy on That Signal

Tech industry
from24/7 Wall St.
2 days ago

Why Google's TPU Talks Just Made Marvell Technology a Must-Buy AI Stock

The custom ASIC market for AI data centers is projected to reach $118 billion by 2033, with Marvell Technology emerging as a key player.
Silicon Valley
fromTechCrunch
3 weeks ago

Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way | TechCrunch

Gimlet Labs raised $80 million to enhance AI inference efficiency across diverse hardware types.
Artificial intelligence
from24/7 Wall St.
1 day ago

AI Compute Demand is Running Way Ahead of Supply - A Stock I'd Buy on That Signal

AI-driven power demand is outpacing supply, creating a significant energy shortfall that may impact top energy producers.
Tech industry
from24/7 Wall St.
2 days ago

Why Google's TPU Talks Just Made Marvell Technology a Must-Buy AI Stock

The custom ASIC market for AI data centers is projected to reach $118 billion by 2033, with Marvell Technology emerging as a key player.
Tech industry
from24/7 Wall St.
2 days ago

"Every Chip Is Getting Used Instantly" - Here's Why Google's AI Dominance May Be Unstoppable

Google's dominance in AI chip ownership positions it as the future leader in technology.
Silicon Valley
fromTechCrunch
3 weeks ago

Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way | TechCrunch

Gimlet Labs raised $80 million to enhance AI inference efficiency across diverse hardware types.
Software development
fromTechzine Global
1 day ago

Scale sets edge platform's software ever more free from hardware constraints

Scale Computing is reducing hardware requirements for its software, allowing more flexibility for partners and customers in choosing hardware platforms.
Data science
fromTheregister
2 days ago

Nvidia slaps forehead: AI, that's what quantum needs!

Nvidia's AI models aim to reduce quantum processor error rates significantly, enhancing the reliability of quantum computing applications.
Python
fromThe JetBrains Blog
1 week ago

How to Train Your First TensorFlow Model in PyCharm | The PyCharm Blog

TensorFlow is an open-source framework for building and deploying machine learning models using tensors and high-level libraries like Keras.
Business
from24/7 Wall St.
4 days ago

3 AI Semiconductor Stocks That Are Now Trading Below 20X Earnings

Three U.S.-based semiconductor stocks are trading low despite strong growth potential and market cap over $1 billion.
fromAxios
1 day ago

Anthropic's AI downgrade stings power users

"Claude has regressed to the point it cannot be trusted to perform complex engineering," an AMD senior director wrote in a widely shared post on GitHub.
Artificial intelligence
Software development
fromMedium
4 days ago

GAIA by AMD - Running Intelligent Systems Fully on Your Own Machine

GAIA is an open-source framework enabling local execution of intelligent agents, eliminating external dependencies and enhancing data control.
Tech industry
fromnews.bitcoin.com
5 days ago

AI Cloud Provider Coreweave Secures Anthropic Agreement for Claude Workloads

Coreweave signed a multi-year agreement with Anthropic to provide cloud infrastructure for AI model development and deployment.
Artificial intelligence
fromFuturism
5 days ago

OpenAI's Latest Thing It's Bragging About Is Actually Kind of Sad

The AI industry faces significant delays and cancellations in data center projects, impacting ambitious computing capacity goals.
Venture
fromTechCrunch
1 month ago

Thinking Machines Lab inks massive compute deal with Nvidia | TechCrunch

Mira Murati's Thinking Machines Lab signed a multi-year strategic partnership with Nvidia involving at least one gigawatt of Vera Rubin systems deployment starting in 2027, with Nvidia also making a strategic investment in the $12 billion-valued AI research company.
#ai-efficiency
Miscellaneous
fromInfoQ
1 month ago

OpenAI Codex-Spark Achieves Ultra-Fast Coding Speeds on Cerebras Hardware

OpenAI deployed GPT-5.3-Codex-Spark on Cerebras wafer-scale chips, achieving 1,000 tokens per second for real-time interactive coding with 15× faster performance than earlier versions.
Gadgets
fromArs Technica
1 month ago

AMD will bring its "Ryzen AI" processors to standard desktop PCs for the first time

AMD's Ryzen AI 400-series desktop processors are repackaged laptop chips with up to 8 CPU cores and Radeon 860M GPUs, targeting business desktops rather than gaming due to high DDR5 memory costs.
Artificial intelligence
fromMedium
3 weeks ago

Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

Model quantization and architectural optimization can outperform larger models, challenging the belief that more GPUs equal greater intelligence.
Data science
fromTechRepublic
1 month ago

Inside the Gas Engine Strategy Powering AI's Next Wave

Gas reciprocating engines are emerging as a critical power solution for AI data centers, with manufacturers like Caterpillar securing multi-gigawatt orders to meet demand that exceeds grid and turbine capacity.
Tech industry
fromTheregister
1 month ago

Nvidia slaps Groq into new LPX racks for faster AI response

Nvidia integrates Groq's language processing units into Vera Rubin systems to dramatically accelerate LLM inference, enabling hundreds to thousands of tokens per second per user.
Silicon Valley
fromTheregister
1 month ago

Meta already deploying Nvidia's standalone CPUs at scale

Meta has deployed Nvidia's standalone Grace CPUs at scale and will deploy Vera CPUs and millions of Superchips to power general-purpose and agentic AI workloads.
Artificial intelligence
fromTechCrunch
1 month ago

Niv-AI exits stealth to wring more power performance out of GPUs | TechCrunch

AI data centers waste significant power due to GPU demand surges, forcing operators to throttle performance by up to 30%, prompting startups like Niv-AI to develop precision power management solutions.
Tech industry
fromComputerworld
1 month ago

System-level 'coopetition': Why Nvidia's DGX Rubin NVL8 runs on Intel Xeon 6

Nvidia's flagship DGX Rubin NVL8 AI systems use Intel Xeon 6 processors as host CPUs to maintain x86 compatibility and meet enterprise deployment requirements.
Tech industry
fromAxios
1 month ago

Nvidia's race to outpace physics

Nvidia CEO projects at least $1 trillion in revenue from newest chips through 2027, though market dominance has declined from 100% to 65% as energy efficiency becomes critical to AI scaling.
Artificial intelligence
fromComputerworld
1 month ago

Nvidia NemoClaw promises to run OpenClaw agents securely

Nvidia introduced NemoClaw with OpenShell security features to address OpenClaw's enterprise security vulnerabilities through sandbox isolation and policy enforcement.
Artificial intelligence
fromTechzine Global
1 month ago

Nvidia's Groq 3 LPU targets agentic AI inference at GTC 2026

Nvidia's acquisition of Groq technology produces the Groq 3 LPU, a specialized inference chip delivering 40 petabytes per second bandwidth, significantly outpacing GPU inference speeds.
Tech industry
from24/7 Wall St.
1 month ago

Nvidia GPU availability near zero, AI compute demand off the charts

GPU availability is near zero, indicating demand from hyperscalers and enterprises far exceeds supply, validated by Nvidia's 73% revenue growth and 75% data center revenue increase.
#intel
Artificial intelligence
fromInfoWorld
1 month ago

Nvidia launches Nemotron 3 Super to power enterprise AI agents

Nemotron 3 Super's hybrid architecture combining Mamba and Transformer technologies enables enterprises to run complex AI agents more efficiently with lower costs and faster execution on existing infrastructure.
Artificial intelligence
fromTNW | Insider
1 month ago

NVIDIA is reportedly building an enterprise AI agent platform

Nvidia is developing NemoClaw, an open-source enterprise AI agent platform, and pitching it to major software companies ahead of an official launch.
Artificial intelligence
fromComputerWeekly.com
1 month ago

Edge AI: What's working and what isn't | Computer Weekly

Edge AI deployment success depends on identifying efficient, narrow use cases with manageable risks rather than pursuing sophisticated, large-scale models across all applications.
Tech industry
fromTheregister
2 months ago

How Nvidia is using emulation to turn AI FLOPS into FP64

Nvidia achieves higher FP64 throughput through software emulation on Rubin GPUs, trading hardware FP64 for emulated matrix performance up to 200 TFLOPS.
from24/7 Wall St.
1 month ago

Nvidia Just Made Another Pair of Brilliant AI Bets

Either way, I think the AI boom is alive and well, but with much of the short-term hype fading away, the big question is whether the long-term trajectory is still there and whether it makes sense for investors to hit the buy button now that the near-term is somewhat less hyped while the long-term is as exciting as ever.
Artificial intelligence
Artificial intelligence
fromTechzine Global
2 months ago

OpenAI seeks faster alternatives to Nvidia chips

OpenAI seeks alternative inference chips with larger on-chip SRAM to improve response speed for coding and AI-to-AI communication, aiming for about 10% of future inference capacity.
Artificial intelligence
fromTechCrunch
1 month ago

Running AI models is turning into a memory game | TechCrunch

Rising DRAM prices and sophisticated prompt-caching orchestration make memory management a critical cost and performance factor for large-scale AI deployments.
fromCointelegraph
2 months ago

What Role Is Left for Decentralized GPU Networks in AI?

What we are beginning to see is that many open-source and other models are becoming compact enough and sufficiently optimized to run very efficiently on consumer GPUs,
Artificial intelligence
Artificial intelligence
from24/7 Wall St.
1 month ago

NVIDIA Cements Its Role as the Backbone of AI Infrastructure

NVIDIA's networking revenue grew 162% year-over-year to $8.2 billion, nearly tripling GPU growth, signaling a shift from chip seller to integrated infrastructure provider selling complete AI data center systems.
Artificial intelligence
fromInfoWorld
2 months ago

Edge AI: The future of AI inference is smarter local compute

Edge AI shifts computation from cloud to devices, enabling low-latency, cost-efficient, and privacy-preserving AI inference while facing performance and ecosystem challenges.
fromInfoQ
2 months ago

NVIDIA Dynamo Planner Brings SLO-Driven Automation to Multi-Node LLM Inference

The new capabilities center on two integrated components: the Dynamo Planner Profiler and the SLO-based Dynamo Planner. These tools work together to solve the "rate matching" challenge in disaggregated serving. The teams use this term when they split inference workloads. They separate prefill operations, which process the input context, from decode operations that generate output tokens. These tasks run on different GPU pools. Without the right tools, teams spend a lot of time determining the optimal GPU allocation for these phases.
Artificial intelligence
Artificial intelligence
fromHackernoon
2 months ago

This "Flash" AI Model Is Fast and Dangerous at Math-Here's What It Can Do | HackerNoon

GLM-4.7-Flash is a 30-billion-parameter mixture-of-experts model offering strong performance for lightweight deployment.
Artificial intelligence
fromArs Technica
2 months ago

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

Cerebras' Wafer Scale Engine enables high token throughput while OpenAI diversifies hardware beyond Nvidia amid fast-paced coding model competition.
Artificial intelligence
from24/7 Wall St.
1 month ago

3 NVIDIA Storylines That Matter

NVIDIA's Q1 FY2027 guidance explicitly excludes China Data Center revenue, signaling regulatory risks and balance sheet exposure from export controls totaling $95.2 billion in supply commitments.
Artificial intelligence
fromTechzine Global
2 months ago

OpenAI swaps Nvidia for Cerebras with GPT-5.3-Codex-Spark

GPT-5.3-Codex-Spark is a Cerebras-optimized, low-latency encoding model generating over 1,000 tokens/sec to enable immediate, minimal, real-time developer code adjustments.
fromTechCrunch
2 months ago

Quadric rides the shift from cloud AI to on-device inference - and it's paying off | TechCrunch

The company, which is based in San Francisco and has an office in Pune, India, is targeting up to $35 million this year as it builds a royalty-driven on-device AI business. That growth has buoyed the company, which now has post-money valuation of between $270 million and $300 million, up from around $100 million in its 2022 Series B, Kheterpal said.
Artificial intelligence
[ Load more ]