#server-npus
#server-npus

Tech industry

Nvidia wants to own your AI data center from end to end

Nvidia expanded its AI infrastructure portfolio with five rack types, including a new LPX inference rack using Groq technology, positioning itself to control all data center processing.

Artificial intelligence

HPE taps Nvidia to transform distributed AI factories into intelligent AI grid | Computer Weekly

HPE launches AI Grid infrastructure powered by Nvidia GPUs to enable distributed, low-latency AI inference at edge locations for real-time applications across retail, manufacturing, healthcare, and telecommunications.

17 hours ago

95% of GPU capacity goes unused in Kubernetes clusters

GPU and CPU usage remains low despite rising cloud costs, highlighting inefficiencies in resource utilization as Kubernetes adoption increases.

fromMedium

The AI Infrastructure Stack in 2026: Companies Building the Future of AI

AI infrastructure companies are transforming the deployment and scaling of artificial intelligence into full production systems with essential governance and observability.

fromwww.businessinsider.com

16 hours ago

Companies are hoarding AI compute because of FOMO and they're sitting on most of it

Companies are overprovisioning GPU capacity, with average utilization at only 5%, leading to significant waste in AI infrastructure spending.

fromZDNET

Nvidia wants to own your AI data center from end to end

Nvidia expanded its AI infrastructure portfolio with five rack types, including a new LPX inference rack using Groq technology, positioning itself to control all data center processing.

HPE taps Nvidia to transform distributed AI factories into intelligent AI grid | Computer Weekly

more#ai-infrastructure

#nvidia

fromnews.bitcoin.com

Artificial intelligence

Nvidia Releases Nemotron 3 Super, a 120B Open AI Model Built for Agentic Workloads

Tech industry

Nvidia's SchedMD acquisition puts open-source AI scheduling under scrutiny

Tech industry

Nvidia embraces optical scale-up as copper reaches limits

Artificial intelligence

Nvidia's Stephen Jones on the toolkit powering GPUs: 'A wild ride'

Business

Nvidia Could Hit $340 by 2031 and the AI Buildout Is Just Getting Started

Artificial intelligence

NVIDIA's GTC Developments Were Far Bigger Than the Market Realizes

fromnews.bitcoin.com

Nvidia Releases Nemotron 3 Super, a 120B Open AI Model Built for Agentic Workloads

Nvidia launched Nemotron 3 Super, a 120 billion parameter model that significantly reduces AI compute costs and increases throughput.

Nvidia's SchedMD acquisition puts open-source AI scheduling under scrutiny

Nvidia's acquisition of Slurm raises concerns about potential bias towards its own hardware in workload management.

Nvidia embraces optical scale-up as copper reaches limits

Nvidia plans to integrate over a thousand GPUs into a single system using photonic interconnects by 2028, investing heavily in optics and interconnect technology.

Nvidia's Stephen Jones on the toolkit powering GPUs: 'A wild ride'

Nvidia's CUDA toolkit is foundational for AI advancements and is driving innovations in quantum computing, robotics, and autonomous vehicles.

Business

Nvidia Could Hit $340 by 2031 and the AI Buildout Is Just Getting Started

NVIDIA's stock is projected to reach $209.50 in one year and $298.29 in five years, driven by strong growth and strategic partnerships.

NVIDIA's GTC Developments Were Far Bigger Than the Market Realizes

Nvidia's stock remains stagnant despite significant innovations, with uncertainty about future reactions to developments in the AI sector.

Forget Nvidia: Why HPE Could Be the Overlooked AI Infrastructure Play of 2026

Hewlett Packard Enterprise is an overlooked investment opportunity in AI infrastructure with strong financial growth and expanding margins.

Business intelligence

fromZDNET

11 hours ago

Scaling agentic AI demands a strong data foundation - 4 steps to take first

Trusted quality data is essential for scaling agentic AI adoption in organizations.

fromThe Verge

11 hours ago

Framework's first eGPUs turn its laptop into a desktop PC

Framework introduces the OCuLink Dev Kit for external GPU support, targeting power users with advanced connectivity options.

Cloudflare Introduces Project Think: A Durable Runtime for AI Agents

Cloudflare's Project Think introduces durable AI agents with a kernel-like runtime, enabling long-lived workloads and preserving execution progress during platform restarts.

OpenAI's new Agents SDK focuses on safety and scalability

OpenAI's updated Agents SDK enables developers to create autonomous AI agents for complex tasks with enhanced usability and a sandbox environment.

Web frameworks

Cloudflare Introduces Project Think: A Durable Runtime for AI Agents

Cloudflare's Project Think introduces durable AI agents with a kernel-like runtime, enabling long-lived workloads and preserving execution progress during platform restarts.

OpenAI's new Agents SDK focuses on safety and scalability

OpenAI's updated Agents SDK enables developers to create autonomous AI agents for complex tasks with enhanced usability and a sandbox environment.

more#ai-agents

fromTNW | Artificial-Intelligence

Google in talks with Marvell Technology to build new AI inference chips alongside Broadcom TPU programme

Google is collaborating with Marvell Technology to develop new AI chips, enhancing its custom silicon supply chain for inference processing.

Productivity

fromSilicon Canals

3 days ago

I let AI plan my workdays down to the minute for a week - the shock wasn't my output, it was realizing how much of my old schedule had been performance - Silicon Canals

Using ChatGPT to manage a calendar revealed that much of the scheduled time was performance rather than productive work.

Data science

Google's TurboQuant Compression May Support Faster Inference, Same Accuracy on Less Capable Hardware

TurboQuant compresses language models' Key-Value caches by up to 6x with near-zero accuracy loss, enabling efficient use of modest hardware.

Environment

The best and worst states for AI data centers

Texas is attracting data center investments with tax incentives, while Maine is implementing a moratorium to evaluate the impact of data centers.

fromMedium

Folder instructions - Instructions for system-level AI

Folders can evolve into active systems that organize and act based on user intent.

European startups

NodeWeaver: Perpetual licensing beats VMware nickel-and-dime

Nodeweaver offers a cost-effective alternative for VMware customers, focusing on edge computing solutions without the complexity of traditional virtualization.

17 hours ago

Snowflake Intelligence and Cortex Code become the agentic AI control layer

"Snowflake gives customers one place to bring their data together, connect the systems they rely on, and turn AI into something that actually helps teams get work done," says Baris Gultekin, VP of AI at Snowflake.

Artificial intelligence

Anthropic Introduces Managed Agents to Simplify AI Agent Deployment

Anthropic's Managed Agents streamline agent-based workflows by handling execution complexities, allowing developers to focus on behavior and tools.

New Scale Computing gets new Velocity Partner Program

Scale Computing revamps its partner program to address market changes and strengthen relationships with partners amid industry challenges.

Scale sets edge platform's software ever more free from hardware constraints

Scale Computing is reducing hardware requirements for its software, allowing more flexibility for partners and customers in choosing hardware platforms.

Scala

New Scale Computing gets new Velocity Partner Program

Scale Computing revamps its partner program to address market changes and strengthen relationships with partners amid industry challenges.

Scale sets edge platform's software ever more free from hardware constraints

Scale Computing is reducing hardware requirements for its software, allowing more flexibility for partners and customers in choosing hardware platforms.

Business intelligence

Nuclear's AI Moment Is Here -- There Is Only 1 Play for the 4X Data Center Demand Explosion

Artificial intelligence

AI Compute Demand is Running Way Ahead of Supply - A Stock I'd Buy on That Signal

Artificial intelligence

Anthropic ups compute deal with Google and Broadcom amid skyrocketing demand | TechCrunch

Business intelligence

3 days ago

Nuclear's AI Moment Is Here -- There Is Only 1 Play for the 4X Data Center Demand Explosion

Global data center power demand will quadruple by 2034, with nuclear energy being crucial for meeting this surge in energy needs.

AI Compute Demand is Running Way Ahead of Supply - A Stock I'd Buy on That Signal

AI-driven power demand is outpacing supply, creating a significant energy shortfall that may impact top energy producers.

fromTNW | Artificial-Intelligence

Anthropic ups compute deal with Google and Broadcom amid skyrocketing demand | TechCrunch

Anthropic signed a new agreement with Google and Broadcom to expand compute capacity for its Claude AI models amid soaring demand.

more#ai

Business

Allbirds rebrands as NewBird AI, pivots from shoes to GPU cloud computing

Allbirds rebrands as NewBird AI, pivoting to GPU-as-a-service after selling its shoe business for $39 million.

IOWN targets datacenter interconnects to spread AI infra

IOWN Global Forum focuses on datacenter interconnect use cases to enhance AI infrastructure connectivity and reduce costs for users.

Intel eases reliance on TSMC with Core Series 3 CPUs

Intel has introduced budget-oriented Core Series 3 processors manufactured in the US using a 2nm process, offering a solid upgrade for older systems.

fromArs Technica

Gadgets

Intel's non-Ultra Core CPUs are new silicon this year for a change

fromEngadget

Intel launches new Core Series 3 chips for mainstream laptops

Intel's new Core Series 3 chips offer significant performance improvements and exceptional battery life for mainstream laptops.

Artificial intelligence

Intel sets sights on data center GPUs amid AI-driven infrastructure shifts

Intel eases reliance on TSMC with Core Series 3 CPUs

Intel has introduced budget-oriented Core Series 3 processors manufactured in the US using a 2nm process, offering a solid upgrade for older systems.

fromArs Technica

Gadgets

Intel's non-Ultra Core CPUs are new silicon this year for a change

fromEngadget

Intel launches new Core Series 3 chips for mainstream laptops

Intel's new Core Series 3 chips offer significant performance improvements and exceptional battery life for mainstream laptops.

Artificial intelligence

Intel sets sights on data center GPUs amid AI-driven infrastructure shifts

more#intel

11 hours ago

Images in ChatGPT are getting a major update

ChatGPT Images 2.0 offers various aspect ratios and a 'thinking' mode for paid users, enhancing image generation capabilities.

Data science

Nvidia slaps forehead: AI, that's what quantum needs!

Nvidia's AI models aim to reduce quantum processor error rates significantly, enhancing the reliability of quantum computing applications.

Science

fromNature

Breakthrough computer chip tech could help meet 'monumental demand' driven by AI

A new light source enables the creation of 8 nm wide structures on silicon wafers, increasing transistor density for advanced computer chips.

AI, energy, and the new rules of cloud sustainability competition | Computer Weekly

Cloud providers offer sustainability metrics, but lack standardization makes it difficult for enterprises to compare workloads effectively.

#anthropic

Artificial intelligence

Anthropic bites back in the compute wars with Amazon partnership

Anthropic is investing heavily in compute capacity to enhance its Claude models, competing directly with OpenAI's infrastructure advantage.

fromSilicon Canals

Why Anthropic is locking in 3.5 gigawatts of compute years before it comes online - Silicon Canals

Anthropic signed a major deal with Google and Broadcom for 3.5 gigawatts of compute capacity, signaling consolidation in the AI industry.

Anthropic bites back in the compute wars with Amazon partnership

Anthropic is investing heavily in compute capacity to enhance its Claude models, competing directly with OpenAI's infrastructure advantage.

fromSilicon Canals

Why Anthropic is locking in 3.5 gigawatts of compute years before it comes online - Silicon Canals

Anthropic signed a major deal with Google and Broadcom for 3.5 gigawatts of compute capacity, signaling consolidation in the AI industry.

Claude Opus 4.7 leads on SWE-bench and agentic reasoning, beating GPT-5.4 and Gemini 3.1 Pro

Claude Opus 4.7 is Anthropic's most capable model, outperforming competitors in software engineering and agentic reasoning with significant improvements.

Designing Memory for AI Agents: Inside Linkedin's Cognitive Memory Agent

LinkedIn's Cognitive Memory Agent enables context-aware AI systems that retain knowledge across interactions, enhancing personalization and continuity.

Cloudflare Launches Code Mode MCP Server to Optimize Token Usage for AI Agents

Cloudflare's Model Context Protocol significantly reduces API interaction costs for AI agents, enhancing efficiency and enabling better task reasoning.

The two-pass compiler is back - this time, it's fixing AI code generation

Multi-pass compilers revolutionized programming by separating analysis and optimization, a model that could enhance AI code generation.

Broadcom's Long-Term Google TPU Deal Is Bigger Than It Looks for AI Infrastructure

Broadcom's long-term agreement with Alphabet for custom TPUs enhances revenue visibility and positions the company for significant growth in AI semiconductor revenue.

Anthropic's AI downgrade stings power users

"Claude has regressed to the point it cannot be trusted to perform complex engineering," an AMD senior director wrote in a widely shared post on GitHub.

Artificial intelligence

fromThe Verge

Arm's first CPU ever will plug into Meta's AI datacenters later this year

Arm AGI CPU features up to 136 cores and claims double the performance per watt compared to x86 chips.

fromFast Company

Speed won't win the AI era. Architecture will

Speed in AI deployment is misleading; true progress requires accountability and ethical engineering in autonomous systems.

A closer look at Nvidia's Groq-powered LPX rack systems

Nvidia acquired Groq for $20 billion primarily to accelerate time-to-market for SRAM-heavy inference chips rather than develop the technology independently, enabling faster token generation for AI reasoning workloads.

#meta

Silicon Valley

Meta already deploying Nvidia's standalone CPUs at scale

US news

Meta commits billions to Nvidia chips

Silicon Valley

Meta already deploying Nvidia's standalone CPUs at scale

US news

Meta commits billions to Nvidia chips

System-level 'coopetition': Why Nvidia's DGX Rubin NVL8 runs on Intel Xeon 6

Nvidia's flagship DGX Rubin NVL8 AI systems use Intel Xeon 6 processors as host CPUs to maintain x86 compatibility and meet enterprise deployment requirements.

Final training of AI models is a fraction of their total cost

Developing AI models incurs significant costs, with most expenditures on scaling and research rather than final training runs.

Nvidia slaps Groq into new LPX racks for faster AI response

Nvidia integrates Groq's language processing units into Vera Rubin systems to dramatically accelerate LLM inference, enabling hundreds to thousands of tokens per second per user.

#ai-efficiency

Google targets AI inference bottlenecks with TurboQuant

TurboQuant improves AI model efficiency by compressing key-value caches, reducing memory usage and runtime without accuracy loss.

Google targets AI inference bottlenecks with TurboQuant

TurboQuant improves AI model efficiency by compressing key-value caches, reducing memory usage and runtime without accuracy loss.

Google targets AI inference bottlenecks with TurboQuant

TurboQuant improves AI model efficiency by compressing key-value caches, reducing memory usage and runtime without accuracy loss.

Google targets AI inference bottlenecks with TurboQuant

TurboQuant improves AI model efficiency by compressing key-value caches, reducing memory usage and runtime without accuracy loss.

5 requirements for using MCP servers to connect AI agents

Organizations deploying MCP servers for agent-to-agent communication must establish upfront strategy, nonfunctional requirements, and security protocols to ensure safer and more trustworthy deployments.

Data science

fromTechRepublic

Inside the Gas Engine Strategy Powering AI's Next Wave

Gas reciprocating engines are emerging as a critical power solution for AI data centers, with manufacturers like Caterpillar securing multi-gigawatt orders to meet demand that exceeds grid and turbine capacity.

fromMedium

Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

Model quantization and architectural optimization can outperform larger models, challenging the belief that more GPUs equal greater intelligence.

Niv-AI exits stealth to wring more power performance out of GPUs | TechCrunch

AI data centers waste significant power due to GPU demand surges, forcing operators to throttle performance by up to 30%, prompting startups like Niv-AI to develop precision power management solutions.

Nvidia NemoClaw promises to run OpenClaw agents securely

Nvidia introduced NemoClaw with OpenShell security features to address OpenClaw's enterprise security vulnerabilities through sandbox isolation and policy enforcement.

Nvidia's Groq 3 LPU targets agentic AI inference at GTC 2026

Nvidia's acquisition of Groq technology produces the Groq 3 LPU, a specialized inference chip delivering 40 petabytes per second bandwidth, significantly outpacing GPU inference speeds.

Nvidia launches Nemotron 3 Super to power enterprise AI agents

Nemotron 3 Super's hybrid architecture combining Mamba and Transformer technologies enables enterprises to run complex AI agents more efficiently with lower costs and faster execution on existing infrastructure.

Intel greets memory apocalypse with Xeon workstation CPUs

The Xeon 600 lineup spans the gamut between 12 and 86 performance cores (no cut-down efficiency cores here), with support for between four and eight channels of DDR5 and 80 to 128 lanes of PCIe 5.0 connectivity. Compared to its aging W-3500-series chips, Intel is claiming a 9 percent uplift in single threaded workloads and up to 61 percent higher performance in multithreaded jobs, thanks in no small part to an additional 22 processor cores this generation.

Tech industry

Meta shifts to AI inference with its future chips

Four generations, MTIA 300, 400, 450, and 500, have been produced within less than two years, with several already in production and others scheduled for mass deployment in 2026 and 2027. The quick pace is deliberate. Rather than betting on a single chip generation and waiting years for results, Meta has adopted a roughly six-month cadence per generation, using modular chiplet architecture to enable incremental upgrades without replacing entire rack systems.

Artificial intelligence

#neoclouds

Artificial intelligence

Neoclouds run AI cheaper and better

Artificial intelligence

How neoclouds meet the demands of AI workloads

Artificial intelligence

Neoclouds run AI cheaper and better

Artificial intelligence

How neoclouds meet the demands of AI workloads

more#neoclouds

Edge AI: What's working and what isn't | Computer Weekly

Edge AI deployment success depends on identifying efficient, narrow use cases with manageable risks rather than pursuing sophisticated, large-scale models across all applications.

NVIDIA Cements Its Role as the Backbone of AI Infrastructure

NVIDIA's networking revenue grew 162% year-over-year to $8.2 billion, nearly tripling GPU growth, signaling a shift from chip seller to integrated infrastructure provider selling complete AI data center systems.

fromCointelegraph

What Role Is Left for Decentralized GPU Networks in AI?

What we are beginning to see is that many open-source and other models are becoming compact enough and sufficiently optimized to run very efficiently on consumer GPUs,

Artificial intelligence

Why AI requires rethinking the storage-compute divide

AI workloads require continuous processing of unstructured multimodal data, causing redundant data movement and transformation that wastes infrastructure costs and data scientist time.

Quadric rides the shift from cloud AI to on-device inference - and it's paying off | TechCrunch

The company, which is based in San Francisco and has an office in Pune, India, is targeting up to $35 million this year as it builds a royalty-driven on-device AI business. That growth has buoyed the company, which now has post-money valuation of between $270 million and $300 million, up from around $100 million in its 2022 Series B, Kheterpal said.

Artificial intelligence

OpenAI seeks faster alternatives to Nvidia chips

OpenAI seeks alternative inference chips with larger on-chip SRAM to improve response speed for coding and AI-to-AI communication, aiming for about 10% of future inference capacity.

NVIDIA Dynamo Planner Brings SLO-Driven Automation to Multi-Node LLM Inference

The new capabilities center on two integrated components: the Dynamo Planner Profiler and the SLO-based Dynamo Planner. These tools work together to solve the "rate matching" challenge in disaggregated serving. The teams use this term when they split inference workloads. They separate prefill operations, which process the input context, from decode operations that generate output tokens. These tasks run on different GPU pools. Without the right tools, teams spend a lot of time determining the optimal GPU allocation for these phases.

Artificial intelligence

fromEngadget