By Mike Umphreys, Sr. Technical Marketing Engineer, and Mike Barmonde, Product Marketing Manager – Enterprise AI
As AI adoption accelerates across industries, organizations continue to face challenges. Customers need a solution that is simple to deploy and manage and offers a secure environment to manage their AI endpoints. Intel Advanced Matrix Extensions (AMX) and the Nutanix Enterprise AI (NAI) 2.4 release combine to meet these demands, offering a powerful yet streamlined infrastructure for AI workloads. Intel AMX unlocks processing efficiency by accelerating matrix-heavy operations directly in hardware on 4th Generation Intel® Xeon ® Processors and newer, while NAI 2.4 is the next step in simplifying AI deployment and lifecycle management through a secure, centralized platform. Together, they enable enterprises to build, deploy, and scale AI applications with confidence—without compromising on ease of use or data protection. And the best part, without the need for dedicated GPUs.
Intel Advanced Matrix Extensions (Intel AMX) is an integrated accelerator included in 4th Generation Intel Xeon Processors or later designed to accelerate AI model tuning and inference while addressing the need for specialized hardware. For a recap of Intel AMX on the Nutanix AOS software stack and AHV hypervisor, please check out Mike Barmonde’s blog post here. The key point to highlight for this blog is that it is a built-in accelerator and instruction set in 4th Generation and later Intel Xeon Processors. By having this capability directly in the CPU, Intel is enabling customers to quickly test and adopt lighter AI use cases.
Nutanix Enterprise AI (NAI) is a platform solution designed with two of our core tenants in mind. The first is to “simplify” - but what does that really mean as it pertains to AI? One of the most prominent ways to simplify is to remove barriers to adoption. Can I deploy and manage LLM’s on-premises? Check! In the Cloud? Yes! At the edge? Absolutely! Can I manage and deploy privately trained models as well as tuned or stock foundational models? Yep, you guessed it!
But what about security? Security is embedded into every layer of the Nutanix stack. From AOS and AHV with built-in STIG’s and cluster level encryption to the Prism Central multi-cluster manager’s RBAC capability, to the Flow Network Security solution, which provides network traffic inside the perimeter to be sent and received by entities that are approved for that communication. This all sounds great but you may be asking how we apply our security posture to AI? NAI 2.4 offers secure, private AI endpoints with RBAC, visibility, and auditability built in. It also supports safety models like Llama‑Guard for content filtering, ensuring safe, policy-compliant outputs.
There are three main use cases today for AMX.
Prototyping AI Workloads - Consists of creating a preliminary version of an AI solution which allows for testing and validating its ability to solve the technical challenge and the value derived from the solution before investing heavily into a full development cycle and hardware purchases.
Online Inferencing with up to 10B Parameter Models - The process of running a pre-trained LLM to generate responses from user prompt requests. These can be chatbots or virtual assistants that provide real-time customer support, providing personalized recommendations, and resolving issues quickly.
Batch Inferencing - These types of jobs are usually scheduled to run at specific times where immediate output is not needed.
RAG Pipelines - A common use case of Intel AMX in RAG pipelines is during the embedding generation phase. Embedding is where raw text is converted into numerical values called embeddings or vectors. The calculations required for dealing with large volumes of text heavily rely on matrix multiplications, which AMX is designed to significantly improve the performance of.
In order to get the best performance when using Intel AMX, there are a few things to consider when purchasing new infrastructure:
When choosing a CPU for Intel AMX accelerated use cases, CPU’s should strike a balance between core count and cost. Customers should also understand when to scale up (Higher Core Count CPU) vs when to scale out (add a node). In general, 20 - 24 vCPUs provide enough compute for online inference use cases while providing lower time to first token (TTFT) latencies. If more cores are needed, you can scale out by adding another instance of the model. In general there should be little to no CPU oversubscription. Depending on model size, multiple endpoints could be run on the same Nutanix node but different NUMA nodes or, if there are sufficient cores, multiple models on the same NUMA node. For example, in a 2 socket system, with 2 x 32 Core Intel Xeon Processors, NUMA 0 could have the Nutanix CVM (Controller Virtual Machine) with 16 vCPUs, and a GenAI workload running on the other 16 CPU cores. On NUMA 1, there could be 2 additional GenAI workloads each consuming 16 CPU cores.
With Intel AMX using system memory to cache LLM’s, it's imperative to have the most memory bandwidth available. When spec’ing your new systems, be sure to purchase not only the fastest memory supported by the CPU’s memory controller and motherboard but also be mindful of DIMM population rules for the given CPU generation you are using.
Another consideration when optimizing for memory bandwidth is to opt for 1 DIMM per channel (DPC) instead of 2 DPC when possible. For example, 4th Gen Intel Xeon Gold and Platinum CPU’s have 8 memory channels per CPU socket. With 1 DPC, memory speeds are up to 4800MT/s. When using 2 DPC, memory speeds drop to 4400MT/s which results in a maximum of 8.3% less bandwidth per channel. . Newer 6th Gen Intel Xeon the 6500P and 6700P series processors feature a maximum transfer speed of 6400MT/s with 1 DPC, and up to 5200MT/s in 2 DPC configurations.
Now that we have put our best foot forward in terms of hardware purchasing, let's move onto what we can do in software to ensure the best performance when using AMX.
Modern hypervisors are great at scheduling compute resources to VMs for general purpose workloads and even when there is CPU oversubscription. But what if our use case benefits from a 1:1 CPU mapping? Even today's modern hypervisors like the Nutanix AHV hypervisor could still schedule vCPUs on the sibling logical core instead of the physical core without oversubscription. This isn't inherently bad but for use cases that demand the most performance, making sure the workload is scheduled only on the physical core is a must.
In addition to pinning vCPUs to the host’s physical cores, for maximum performance, you will also need to make sure that the workload is only schedulable on 1 NUMA node (CPU and RAM). Doing this ensures that memory access requests from the CPU are being serviced from the memory connected to its socket. If a workload is not pinned to a NUMA node or has a configuration larger than a NUMA node worth of resources, memory reads and writes will use the Intel Ultra Path Interconnect between CPU0 and CPU1 to access memory resources on the other CPU. While these interconnects can facilitate high bandwidth, it is not as fast as having the VM access memory on the same CPU socket that its workload is running on.
Using Nutanix ACLI commands, we can facilitate both of these requirements.
Note: For the online inference and RAG pipeline use-case, we recommend dedicated AI infrastructure if the highest performance when using AMX is required. Prototyping can be done in a mixed environment without using the settings below.
Perform the following command on the VMs hosting a LLM.
acli vm.update <vm-name> num_vnuma_nodes=1 vcpu_hard_pin=true
num_vnuma_nodes=1
An administrator can enable virtual Uniform Memory Access (vUMA), so that the VM uses both CPU and memory from the same NUMA node (ex: set this parameter to 1).
vcpu_hard_pin=true
This flag pins the VM to a given socket and the first physical cores (dependent on the configured vCPU in the VM) and should be used very carefully. If the VM is configured for example with 4 vCPUs (or 1 vCPU with 4 cores) the first 4 x physical cores will be pinned to this VM. This flag ensures they are pinned on either CPU1 or CPU2 (randomly when hosting the VM). While the VM stays on this CPU, we cannot define which CPU. This flag should be used only for one User VM per node as it pins vCPUs to the same physical cores. It should also be made sure to give this User VM close to the max of a single NUMA node's memory, so we won't start this VM on the same NUMA node as the CVM. If this VM runs on the same CPU cores on the NUMA node as the CVM, both the CVM and UVM potentially see high scheduling contention.
Leveraging Intel Advanced Matrix Extensions (AMX) with Nutanix NAI 2.4 on the Nutanix Cloud Platform solution provides an easy to deploy and use yet powerful platform for running AI workloads. Intel AMX delivers a significant performance boost to AI inference and machine learning workloads by accelerating matrix operations at the hardware level, enabling faster and more efficient data processing. Nutanix NAI 2.4 compliments this by providing a unified, easy to manage platform leveraging built in security features such as role-based access control and data encryption. Together, they offer a powerful, enterprise-ready solution that makes AI operations simple and secure, while providing increased performance and optimized resource utilization—making it easier for organizations to operationalize AI at scale.
Intel Xeon Processors powering the Nutanix Cloud Platform and NAI can meet the computing demands of multiple components of an AI pipeline or even multiple inference endpoints concurrently while also serving general mixed workloads.
Liked this blog? Check out our Intel® AMX with Nutanix Cloud Infrastructure video here. To see how NAI can help solve your AI challenges, visit https://www.nutanix.com/products/nutanix-enterprise-ai.
©2025 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Intel, the Intel logo and other Intel marks are trademarks of Intel Corporation or its subsidiaries. All other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s).