Most of the AI industry has been GPU-focused since data centres want to get the most out of their GPUs when it comes to LLMs. However, there is a major paradigm shift happening towards Agentic AI, or autonomous systems that are capable of performing increasingly complicated tasks. Consequently, hardware needs are changing from GPU-centric to rack-scale CPU density. AMD recently highlighted this change, noting the importance of CPU density in the development of autonomous AI systems.
Traditional AI mostly functions on a transactional approach, where the user’s prompt results in a response from the GPU. Agentic AI, on the other hand, takes this interaction further by not just communicating but also taking action. It excels at breaking down large goals into smaller, manageable parts, enabling it to interface with external APIs, query databases, validate its results, and iterate on processes until they are completely implemented.
Also Read: Realme P4R 5G with 8,000mAh battery and military-grade drop protection launched in India
The infrastructure that supports Agentic AI operations is far larger than that of standard large language model (LLM) inference, which is mostly done on GPUs. This operational complexity necessitates multiple components:
- Orchestration and Logic: Frameworks such as LangChain or AutoGen help in orchestrating decision-making processes inside the system.
- Data Retrieval and Caching: The design uses huge Key-Value (KV) caches and vector databases, which are important for retaining the context of conversations over time.
- API and Web Management: NGINX for routing, server-side Java for processing and handling a variety of microservices.
This supporting infrastructure, however, is notably dependent on CPU capabilities. If the CPU is not doing well, it hurts the GPU performance. This is the kind of worry AMD has expressed about “rack-scale execution”. This synergy between CPU and GPU efficiency is important to the effectiveness of Agentic AI systems.
Data centres have a limit on the amount of electricity and cooling they can provide for each server rack, generally 100 kW. Recently, AMD tested dual-processor architectures for workloads key to Agentic AI infrastructure, including Java and NGINX. The results showed considerable improvements in server density.
In terms of CPU architecture, the results are as follows:
- NVIDIA Vera: 88 Cores, serving as the baseline with a throughput of 1.00x.
- Intel Xeon 6980P: 128 Cores, delivering approximately 1.5x throughput.
- AMD EPYC 9965, also known as “Turin”: 192 Cores, with a throughput of 2.37x.
- The forthcoming AMD EPYC™ architecture, dubbed “Venice”: 256 Cores, is projected to achieve a throughput of 3.30x.
AMD’s “Turin” processors enable more than 27,000 x86 cores to be installed in a liquid-cooled rack, and the next generation “Venice” architecture is expected to expand that to more than 36,000 cores. This rise in core density is predicted to directly impact the throughput per square foot in data centres that house many AI agents.
AMD’s EPYC series is using software continuity in the x86 ecosystem in a way that is different from its rivals, who are pushing proprietary designs that need large infrastructure modifications. This way, users may use the latest software stacks directly without many rewrites, which makes it easier for large companies to make the switch.
Also Read: XElectron ARZOPA Z3FC 16.1-inch 2.5K portable gaming monitor launched in India
As the artificial intelligence (AI) landscape evolves, the industry’s emphasis shifts away from just building highly intelligent models and toward effectively deploying them at scale in autonomous systems. The shift introduces additional issues, emphasising the significance of properly managing complex systems rather than focusing solely on their mathematical performance.
AMD believes central processing units (CPUs) are very important in this current AI revolution, with its technology leading the way in implementing AI in commercial contexts.


