NVIDIA Blackwell and Hopper Platforms Set New Records in MLPerf Inference V50 Benchmarks

NVIDIA has set a new benchmark in the field of artificial intelligence with its Blackwell and Hopper platforms, which achieved remarkable results in the latest MLPerf Inference V5.0. The new records underscore the continuous advancements in AI infrastructure, enabling more efficient and scalable AI-driven solutions. These innovations signal a new era for AI factories, which are transforming raw data into actionable insights with unprecedented speed and accuracy.

MLPerf Inference V5.0: A Snapshot of Industry-Leading Performance

In the recently concluded MLPerf Inference V5.0 benchmarks, NVIDIA’s Blackwell platform, equipped with the cutting-edge GB200 NVL72 system, set new performance records. This submission marked the company’s first use of the NVIDIA GB200 NVL72 system, a sophisticated AI reasoning solution built for AI factories. These AI factories are designed not only to process and store data but also to manufacture intelligence at scale, transforming vast amounts of raw data into real-time insights for users worldwide.

AI factories aim to deliver highly accurate responses to user queries with minimal latency and at scale, focusing on lowering the cost per token while maximizing throughput. As AI models continue to evolve, growing to billions and trillions of parameters, the computational requirements for processing data are intensifying, making it even more challenging to keep inference throughput high and costs low.

The MLPerf Inference benchmark suite, a peer-reviewed standard for evaluating AI performance, now includes updated tests with some of the most demanding models in the industry. The of the Llama 3.1 405B, one of the largest and most complex open-weight models, as well as the Llama 2 70B Interactive benchmark with stricter latency requirements, further tests the limits of AI hardware and software.

NVIDIA’s performance in these updated benchmarks is a testament to the power of its platforms, with both the Blackwell and Hopper architectures demonstrating impressive advancements in AI performance.

NVIDIA Blackwell: Unprecedented Performance in Inference Tasks

The GB200 NVL72 system, which links 72 NVIDIA Blackwell GPUs to act as one unified super-GPU, achieved a groundbreaking 30x higher throughput on the Llama 3.1 405B benchmark compared to the previous H200 NVL8 system. This performance boost was the result of more than triple the performance per GPU and a 9x expansion of the NVIDIA NVLink interconnect domain, enhancing the system’s efficiency and capacity.

NVIDIA’s submission was one of the few to tackle the Llama 3.1 405B benchmark, showcasing the immense power and scalability of the Blackwell platform. At the same time, the Llama 2 70B Interactive benchmark highlighted NVIDIA’s commitment to improving user experience by delivering lower time-to-first-token (TTFT) and time per output token (TPOT), further solidifying its position as a leader in AI inference.

On the Llama 2 70B Interactive benchmark, the use of eight Blackwell GPUs in the NVIDIA DGX B200 system tripled the performance over an equivalent eight-H200 GPU setup, setting a new standard for responsiveness and throughput in real-world AI deployments.

NVIDIA Hopper: Continuously Improving AI Factory Efficiency

NVIDIA’s Hopper architecture, which powers many of the most efficient AI inference factories today, has shown consistent performance improvements over the last year. With its ability to handle increasingly complex models, such as the Llama 3.1 405B and the Llama 2 70B benchmarks, Hopper continues to serve as a critical piece in the AI ecosystem.

The latest iteration of Hopper GPUs, the H200, saw a 1.5x performance increase compared to the previous H100 GPU and a 1.6x increase in throughput over its predecessor. These ongoing improvements ensure that Hopper-based AI factories maintain their edge, even as the complexity of the models they run continues to grow.

Hopper’s versatility in handling a diverse range of workloads makes it an essential tool for AI research and production environments, helping organizations tackle the most challenging AI tasks with increased speed and accuracy.

Collaboration with Partners: Strengthening the AI Ecosystem

The success of NVIDIA’s platforms in MLPerf Inference V5.0 is not just the result of NVIDIA’s innovation but also the collaborative efforts of a wide ecosystem of partners. With 15 companies contributing stellar results using the NVIDIA platform, the breadth of support for NVIDIA’s hardware and software underscores the platform’s reach and influence.

NVIDIA’s partners include major tech players like ASUS, Cisco, Dell Technologies, Google Cloud, Oracle, and VMware, all of whom leverage NVIDIA’s solutions to deliver superior AI performance. This ecosystem approach ensures that NVIDIA’s platforms remain at the forefront of AI innovation, meeting the diverse needs of industries around the world.

What Undercode Says: Analysis of

The advancements highlighted in MLPerf Inference V5.0 benchmarks are indicative of a broader shift towards highly specialized AI infrastructure — AI factories. As AI models evolve and become more complex, the compute required to power these systems also grows exponentially. The results showcased by NVIDIA’s Blackwell and Hopper platforms suggest a major leap forward in handling this increased demand.

AI factories are now not only storing and processing data but actively manufacturing intelligence. This marks a significant shift in how data centers function, as the ability to generate real-time insights at scale becomes the defining feature of modern AI-driven operations. By combining cutting-edge hardware (like the Blackwell GPUs) with optimized software stacks, NVIDIA is paving the way for a future where inference tasks are executed with minimal latency and maximum throughput.

One of the key takeaways is the optimization of key AI performance metrics such as TTFT and TPOT. These metrics are crucial for delivering a smooth, responsive user experience, especially in AI applications where users expect near-instantaneous responses. The impressive improvements NVIDIA has made in these areas are a direct response to the growing demand for fast, reliable AI solutions that can handle increasingly complex models.

Furthermore, the consistency of performance gains across both the Blackwell and Hopper platforms highlights NVIDIA’s commitment to continuous improvement. This level of ongoing innovation is critical in an industry where performance bottlenecks can severely impact the usability and scalability of AI applications.

In an ecosystem where both hardware and software optimization are equally important, NVIDIA’s approach of enhancing both simultaneously gives it a significant advantage over competitors. The growing number of partners supporting NVIDIA’s technology only reinforces the platform’s dominance in the AI space.

Fact Checker Results: A Quick Review

NVIDIA’s Blackwell platform set new performance records in MLPerf Inference V5.0, particularly in the Llama 3.1 405B and Llama 2 70B Interactive benchmarks.
Performance improvements of up to 30x on certain benchmarks showcase the effectiveness of the Blackwell architecture and its integration with NVLink.
The ongoing advancements in NVIDIA’s Hopper architecture highlight the company’s continuous push for greater AI throughput and efficiency.