HomeNewsTechnologyNVIDIA's Vera Rubin Platform Is the Biggest Bet Yet on the Future...

NVIDIA’s Vera Rubin Platform Is the Biggest Bet Yet on the Future of AI Infrastructure

follow us on Google News

Artificial intelligence infrastructure is entering a new phase, and NVIDIA is moving to define what that phase looks like. The company has announced that its Vera Rubin platform is now in full production, bringing together seven new chips designed to work as a single unified supercomputer capable of handling every stage of modern AI workloads, from the earliest rounds of model pretraining through to real-time agentic inference at massive scale.

The announcement, made at GTC, signals a meaningful shift in how AI infrastructure is being conceived and deployed. Where previous generations of AI hardware centered on individual chips or standalone servers, Vera Rubin is built around the idea of the AI factory, a coordinated system of purpose-built racks working together as one coherent computing platform.

The Platform and Its Components

The Vera Rubin platform brings together six internally developed chips alongside a newly integrated processor from Groq. Those chips are the NVIDIA Vera CPU, the NVIDIA Rubin GPU, the NVIDIA NVLink 6 Switch, the NVIDIA ConnectX-9 SuperNIC, the NVIDIA BlueField-4 DPU, and the NVIDIA Spectrum-6 Ethernet switch, now joined by the NVIDIA Groq 3 LPU. Each of these components has a distinct role, and together they are intended to cover the full computational spectrum that modern AI demands.

The platform organizes into five rack types. These are the Vera Rubin NVL72 GPU rack, the Vera CPU rack, the NVIDIA Groq 3 LPX inference accelerator rack, the NVIDIA BlueField-4 STX storage rack, and the NVIDIA Spectrum-6 SPX Ethernet rack. Each rack is purpose-built for a specific function, and all five are designed to operate together within a single integrated AI factory environment.

Jensen Huang, founder and CEO of NVIDIA, described the platform as a generational leap and pointed to the current moment as an inflection point for agentic AI. He characterized the buildout enabled by Vera Rubin as potentially the greatest infrastructure expansion in history.

The Vera Rubin NVL72 GPU Rack

The centerpiece of the platform is the Vera Rubin NVL72 rack, which integrates 72 Rubin GPUs and 36 Vera CPUs connected through NVLink 6, along with ConnectX-9 SuperNICs and BlueField-4 DPUs. The architecture is designed specifically for the demands of training large mixture-of-experts models, and NVIDIA says it achieves comparable results to the previous Blackwell platform using only one-fourth the number of GPUs. At inference, it delivers up to 10 times higher throughput per watt at one-tenth the cost per token.

For organizations running hyperscale AI factories, NVL72 scales across large GPU clusters using both NVIDIA Quantum-X800 InfiniBand and Spectrum-X Ethernet, enabling sustained high utilization while reducing training time and total cost of ownership. The efficiency gains are substantial enough that they fundamentally change the economics of running large-scale AI workloads, which matters as the size and complexity of frontier models continues to grow.

The Vera CPU Rack

Not all AI workloads run on GPUs. Reinforcement learning and agentic AI in particular depend heavily on CPU-based environments to test and validate outputs generated by GPU-side models. The Vera CPU rack addresses this need directly.

Built on NVIDIA MGX infrastructure and using liquid cooling throughout, the Vera CPU rack integrates 256 Vera CPUs into a dense, energy-efficient system designed for large-scale agentic and reinforcement learning workloads. NVIDIA says the Vera CPU delivers results twice as efficiently and 50 percent faster than traditional CPUs, which has direct implications for organizations trying to run complex multi-agent systems at scale.

The rack integrates with Spectrum-X Ethernet networking so that CPU environments remain tightly synchronized across the broader AI factory. When combined with GPU compute racks, the Vera CPU rack provides the processing foundation that makes large-scale agentic AI operationally viable.

The NVIDIA Groq 3 LPX Rack

One of the most distinctive additions to the Vera Rubin platform is the integration of Groq’s LPU technology. The NVIDIA Groq 3 LPX rack is designed specifically for the low-latency, large-context demands that agentic AI systems place on inference infrastructure. When deployed alongside Vera Rubin NVL72, Rubin GPUs and LPUs work together by jointly computing every layer of the AI model for every output token during the decode phase, which significantly improves throughput.

The numbers NVIDIA cites are striking. The LPX rack, which houses 256 LPU processors with 128GB of on-chip SRAM and 640 TB/s of scale-up bandwidth, delivers up to 35 times higher inference throughput per megawatt compared with standard approaches, and creates up to 10 times more revenue opportunity for trillion-parameter models. The architecture is optimized specifically for trillion-parameter models running on million-token context windows, a combination that represents the bleeding edge of what large language models are capable of today.

NVIDIA describes this as unlocking a new tier of ultra-premium inference, one that makes it economically viable for AI providers to serve the most demanding model configurations at scale. The LPX rack is fully liquid cooled, built on MGX infrastructure, and is expected to be available in the second half of this year.

The BlueField-4 STX Storage Rack

Memory and storage have become critical bottlenecks as AI models grow larger and agentic workflows generate increasing volumes of intermediate data. The NVIDIA BlueField-4 STX rack-scale system addresses this by creating what NVIDIA calls an AI-native storage infrastructure that extends GPU memory seamlessly across the entire POD.

Powered by BlueField-4, which itself combines the NVIDIA Vera CPU and NVIDIA ConnectX-9 SuperNIC, the STX rack delivers a high-bandwidth shared storage layer optimized specifically for the key-value cache data that large language models and agentic AI workflows generate in large quantities. This kind of data is produced constantly during multi-turn AI interactions and must be stored and retrieved with minimal latency to keep agents responsive.

A new software framework called NVIDIA DOCA Memos works alongside BlueField-4 to dedicate processing capacity specifically to KV cache storage operations. The result is up to a fivefold improvement in inference throughput compared with general-purpose storage architectures, along with meaningfully better power efficiency. For AI services running at scale, the practical benefit is faster multi-turn interactions, more scalable agent deployments, and higher overall utilization of existing infrastructure.

Timothée Lacroix, cofounder and chief technology officer of Mistral AI, said the STX system represents a critical performance boost for scaling agentic AI efforts, noting that by creating a storage tier purpose-built for AI agent memory, the system is well-positioned to help models maintain coherence and speed when reasoning across massive datasets.

The Spectrum-6 SPX Ethernet Rack

Connecting all of this infrastructure at scale requires networking that can handle the intense east-west traffic patterns that AI factories generate as data flows between compute, memory, and storage systems. The Spectrum-6 SPX Ethernet rack is engineered for exactly this purpose.

Configurable with either Spectrum-X Ethernet or NVIDIA Quantum-X800 InfiniBand switches depending on the deployment requirements, the SPX rack delivers low-latency, high-throughput connectivity between racks at scale. NVIDIA has also introduced Spectrum-X Ethernet Photonics with co-packaged optics, an approach that achieves up to five times greater optical power efficiency and ten times higher resiliency compared with traditional pluggable transceivers. As AI factories grow to encompass thousands of GPUs and CPUs communicating constantly, the efficiency and reliability of that networking fabric becomes directly tied to the overall performance and operating cost of the system.

System-Level Efficiency and Resilience

NVIDIA has also introduced the DSX platform for Vera Rubin, developed alongside more than 200 data center infrastructure partners. The platform includes DSX Max-Q, a capability for dynamic power provisioning across the full AI factory that enables operators to deploy 30 percent more AI infrastructure within a fixed-power data center. A separate software layer called DSX Flex enables AI factories to function as grid-flexible assets, which NVIDIA says can unlock access to 100 gigawatts of stranded grid power.

The company has also released the Vera Rubin DSX AI Factory reference design, a blueprint for integrated AI infrastructure that coordinates compute, networking, storage, power, and cooling to maximize tokens per watt, improve resilience, and accelerate time to first production. As data center operators face increasing pressure around power consumption and energy costs, these system-level design tools are becoming as important as the chips themselves.

Industry Backing and Availability

The scale of industry support behind Vera Rubin reflects the platform’s strategic importance. AI labs and frontier model developers including Anthropic, Meta, Mistral AI, and OpenAI are all planning to use the platform for training larger and more capable models while serving long-context, multimodal systems at lower latency and cost than prior GPU generations.

Dario Amodei, CEO and cofounder of Anthropic, noted that as enterprises and developers use Claude for increasingly complex reasoning, agentic workflows, and mission-critical decisions, the underlying infrastructure must keep pace. He said the Vera Rubin platform provides the compute, networking, and system design needed to continue advancing both performance and the safety and reliability that customers depend on.

Sam Altman, CEO of OpenAI, framed the platform as the foundation for running more powerful models and agents at massive scale while delivering faster and more reliable systems to hundreds of millions of people.

Cloud providers including Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure will make Vera Rubin available through their platforms. NVIDIA Cloud Partners CoreWeave, Crusoe, Lambda, Nebius, Nscale, and Together AI are also expected to offer access. On the hardware side, system manufacturers including Cisco, Dell Technologies, HPE, Lenovo, and Supermicro are expected to ship servers based on Vera Rubin products, alongside a broad set of additional manufacturing partners.

Vera Rubin-based products will begin reaching partners and customers in the second half of this year.

What This Means for AI at Scale

The Vera Rubin platform represents a clear statement about where NVIDIA believes AI infrastructure is heading. The era of optimizing individual chips is giving way to an era of optimizing entire systems, where compute, memory, networking, storage, power, and cooling are codesigned from the ground up to work as one. The result is not just faster AI, but more efficient, more scalable, and more economically viable AI at every level of the stack.

As models grow larger, context windows expand, and agentic systems become more central to how organizations use AI, the demands on infrastructure will only intensify. Vera Rubin is NVIDIA’s answer to those demands, and the breadth of the platform, spanning seven chips, five rack types, and an ecosystem of over 80 hardware partners, suggests the company is building for a very long runway.


Discover more from SNAP TASTE

Subscribe to get the latest posts sent to your email.

Leave a Reply

FEATURED

RELATED NEWS