the Power of CPUs for Agentic AI: Benchmarking on GCP's 5th Gen Xeon

2024-12-16

The future of AI is agentic – capable of perceiving, reasoning, and taking action. But how do we run these sophisticated systems efficiently? This article explores the potential of CPUs, specifically the latest 5th Gen Intel Xeon processors (codenamed Emerald Rapids) on Google Cloud Platform (GCP), to power agentic AI workloads.

We delve into benchmarks comparing two GCP Compute Engine instances: N2 (powered by 3rd Gen Xeon) and C4 (powered by 5th Gen Xeon with Intel AMX for AI acceleration). Our focus is on two key agentic AI components: text embedding and text generation.

Why CPUs for Agentic AI?

Traditionally, agentic AI relied heavily on accelerators like GPUs, but this introduces challenges like managing data transfer between CPU and accelerator. Additionally, the rise of smaller yet powerful Small Language Models (SLMs) and advancements in CPU AI capabilities make them a compelling option.

Benchmarking the Power of 5th Gen Xeon

We used Hugging

Text Embedding: Using the `WhereIsAI/UAE-Large-V1` model with input sequences of 128 tokens, varying batch sizes from 1 to 128.
Text Generation: Using the `meta-llama/Llama-3.2-3B` model with input sequences of 256 tokens and output sequences of 32 tokens, varying batch sizes from 1 to 64.

The Results: A Significant Performance Boost

The C4 instance delivered impressive improvements:

Text Embedding: C4 outperformed N2 by a staggering 10x to 24x higher throughput across all batch sizes.
Text Generation: C4 consistently showed a 2.3x to 3.6x higher throughput compared to N2.

This translates to real-world benefits:

Faster processing: C4 can handle significantly more tasks in a given timeframe.
Improved concurrency: C4 allows for handling multiple queries simultaneously without impacting user experience.

What Undercode Says:

These results unlock exciting possibilities for deploying lightweight agentic AI solutions entirely on CPUs. This can lead to:

Reduced complexity: Eliminating the need for accelerators simplifies system architecture.
Lower costs: CPUs are typically more cost-effective than accelerators.
Reduced overhead: Less data transfer between CPU and accelerator minimizes latency.

The Future of Agentic AI on CPUs

With the recent release of Intel Xeon 6 processors (codenamed Granite Rapids) promising another 2x performance leap for Llama 3, the future of CPU-powered agentic AI looks even brighter. We are eager to explore deploying lightweight agentic AI solutions solely on these CPUs once GCP offers Granite Rapids instances.

This article demonstrates the potential of CPUs, particularly the latest generation Intel Xeon with AMX, for efficient execution of agentic AI tasks. As CPU technology continues to evolve, we can expect them to play an increasingly significant role in powering the next generation of AI applications.

References:

Reported By: Huggingface.co
https://www.linkedin.com
Wikipedia: https://www.wikipedia.org
Undercode AI: https://ai.undercodetesting.com

Image Source:

OpenAI: https://craiyon.com
Undercode AI DI v2: https://ai.undercode.help

Listen to this Post