For years, the story of AI hardware has been almost entirely about GPUs. Train bigger models, run faster inference, stack more GPU racks, repeat. But something is shifting. The newest wave of AI systems are not just generating text or images on demand. They are planning, reasoning, calling tools, writing and running code, checking their own work, and doing all of that across extended sessions that can involve hundreds of individual steps. That kind of workload puts pressure on a part of the system that has largely been an afterthought in AI infrastructure conversations: the CPU.
NVIDIA’s answer is the Vera CPU, which the company is calling the world’s first processor purpose-built for agentic AI. It is in full production now and will be available from partners later this year. And based on the list of companies already lining up to use it, the industry seems to agree that the general-purpose CPU era for AI infrastructure is over.
The Problem With Using the Wrong CPU for AI Agents
Here is the issue. When an AI agent is working through a complex task, it is not just running a model once and handing back an answer. It is orchestrating a sequence of operations: retrieving context, deciding what tool to call, processing the result, updating its internal state, looping back, and doing that over and over until the task is done. All of that coordination and orchestration runs on the CPU, not the GPU. And if the CPU cannot keep up, the entire system slows down regardless of how powerful the GPU side is.
At scale, this gets worse fast. An AI factory running tens of thousands of concurrent agent instances needs a CPU that can handle massive parallelism without degrading, sustain high single-thread performance for the parts of the workload that cannot be parallelized, and do all of this without consuming so much power that the economics fall apart. Standard server CPUs were never designed for that. They were designed for enterprise applications, databases, and web serving. Vera is designed for this.
NVIDIA says Vera delivers results twice as efficiently and 50 percent faster than traditional rack-scale CPUs. Those numbers come from a combination of architectural decisions that go pretty deep.
What Is Actually Inside Vera
The processor is built around 88 custom NVIDIA-designed cores called Olympus cores. These are not adapted versions of existing CPU cores. They are purpose-designed for the specific mix of things agentic AI requires: running compilers, managing runtime engines, powering analytics pipelines, handling orchestration services, and keeping agentic tooling moving at speed.
Each core can run two simultaneous tasks using a capability NVIDIA calls Spatial Multithreading, which is designed to deliver consistent performance even under heavy multitenant workloads where many independent jobs are competing for resources at the same time. Consistency matters here as much as peak performance, because unpredictable latency in a multi-agent system creates cascading slowdowns that are hard to diagnose and harder to fix.
Memory is where Vera makes one of its most striking departures from conventional CPU design. It uses LPDDR5X memory with up to 1.2 TB per second of bandwidth, which is twice the bandwidth of general-purpose CPUs at half the power. In agentic workloads, the CPU is constantly moving large volumes of context data, model state, and intermediate results. Memory bandwidth directly determines how many of those operations can happen simultaneously, and cutting the power cost in half while doubling the bandwidth is a genuinely significant engineering achievement.
Vera also uses NVIDIA’s second-generation Scalable Coherency Fabric to maintain performance under extreme utilization. This is the part that keeps the processor from falling apart when it is running at the limits of its capacity, which is where general-purpose hardware tends to show its weaknesses most clearly.
How It Fits Into the Bigger NVIDIA System
Vera does not exist in isolation. It is a core component of the NVIDIA Vera Rubin platform, and it is designed to work with NVIDIA’s GPU infrastructure in ways that go well beyond a standard CPU-GPU relationship.
When paired with Rubin GPUs inside the Vera Rubin NVL72 platform, Vera connects through NVLink-C2C interconnect technology at 1.8 TB per second of coherent bandwidth. That is seven times the bandwidth of PCIe Gen 6, the standard interface that most CPU-GPU systems use today. At that bandwidth, the distinction between CPU memory and GPU memory starts to blur, and data can flow between the two sides of the system fast enough to eliminate most of the communication overhead that has historically created latency in hybrid workloads.
NVIDIA has also introduced reference designs that use Vera as the host CPU for HGX Rubin NVL8 systems, giving it direct control over data movement and system coordination for GPU-accelerated workloads. The CPU is not just sitting alongside the GPU here. It is actively managing the flow of work across the AI factory.
Every Vera configuration also includes NVIDIA ConnectX SuperNIC cards and BlueField-4 DPUs for networking, storage acceleration, and security. For agentic AI specifically, that last piece matters. Agents that interact with external tools, APIs, and data sources need secure, low-latency connectivity. Building that in at the hardware level rather than handling it in software is the kind of design decision that compounds across a large deployment.
The Vera CPU Rack
For factory-scale deployments, NVIDIA has also announced a dedicated Vera CPU rack that packs 256 liquid-cooled Vera CPUs into a single unit. That rack can sustain more than 22,500 concurrent CPU environments running simultaneously and independently at full performance, which means a single rack can support the orchestration layer for a very large agentic deployment.
The rack is built on NVIDIA’s MGX modular reference architecture, which is supported by 80 ecosystem partners, and it is configurable in both dual and single-socket server arrangements depending on the workload. Partners are targeting use cases ranging from reinforcement learning and agentic inference to data processing, storage management, and high-performance computing.
Who Is Already Using It
The adoption list for Vera is long and covers a wide range of the organizations that are currently building the most demanding AI infrastructure on the planet.
Cloud service providers planning to deploy Vera include Alibaba, ByteDance, Cloudflare, CoreWeave, Crusoe, Lambda, Meta, Nebius, Nscale, Oracle Cloud Infrastructure, Together AI, and Vultr. Hardware and infrastructure partners building Vera-based systems include Dell Technologies, HPE, Lenovo, Supermicro, ASUS, Cisco, Compal, Foxconn, GIGABYTE, Hyve, Inventec, MiTAC, MSI, Pegatron, Quanta Cloud Technology, Wistron, and Wiwynn.
On the application side, Cursor, the AI coding assistant company, is using Vera to improve throughput and efficiency for its coding agents. Redpanda, which builds streaming data infrastructure, tested Vera on Apache Kafka-compatible workloads and reported up to 5.5 times lower latency than competing systems, with its CEO describing the architecture as a new direction for CPUs with more memory and less overhead per core.
National laboratories are also paying attention. The Leibniz Supercomputing Centre, Los Alamos National Laboratory, Lawrence Berkeley National Laboratory’s National Energy Research Scientific Computing Center, and the Texas Advanced Computing Center are all planning Vera deployments. TACC, which tested Vera across six scientific applications ahead of deploying it in its upcoming Horizon system, described per-core performance and memory bandwidth as a significant step forward for scientific computing.
Why This Actually Matters
The launch of Vera is a signal that the AI hardware conversation is broadening. GPUs are still the center of gravity, but the supporting cast around them is becoming more important as AI workloads grow more complex. The orchestration layer that runs on CPUs, the storage layer that keeps context accessible, the networking layer that connects everything together, all of it has to keep pace with what the GPU is doing or the whole system bogs down.
NVIDIA has been building toward this kind of full-stack infrastructure story for several years, and Vera is one of the clearest expressions of that strategy. It is not a general-purpose CPU with some AI marketing attached. It is a processor designed around the specific operational demands of running AI agents at scale, and the architectural choices inside it reflect a genuine understanding of where those demands come from.
Whether Vera lives up to its performance claims in real-world deployments at scale is something that will take time to verify. But the depth of the adoption already forming around it, spanning hyperscalers, national laboratories, cloud providers, and application developers, suggests that the need it is addressing is real, and that the industry has been waiting for something like it.
Vera is in full production now and will be available from partners in the second half of this year.
Discover more from SNAP TASTE
Subscribe to get the latest posts sent to your email.



