AMD and Intel have published the latest specification for ACE, a new set of x86 extensions designed to close the performance gap in artificial intelligence workloads. This joint effort directly impacts developers and hardware designers building next-generation processors for neural networks and large language models. The update provides a standardized path for accelerating matrix multiplication, which is the core computational task for modern AI applications.

Joint specification adds dedicated registers and low-precision formats
The ACE initiative operates under the x86 Ecosystem Advisory Group alongside other key projects like FRED, AVX10, and ChkTag. These extensions aim to augment existing AVX and scalar code with specialized capabilities for AI tasks. By defining these primitives at the architecture level, the industry seeks to improve scalability and energy efficiency beyond what current SIMD extensions can deliver.
- Extension Name: ACE (AI Compute Extensions)
- Supported Data Formats: INT8, INT32, FP32, BF16, FP16, E8M0, FP8, MX FP8, MX FP6, MX FP4, MX INT8
- Primary Function: Matrix multiplication primitives and low-precision format conversion
- Register State: Tile and block scale registers
ACE introduces dedicated tile and block scale registers to integrate seamlessly with AVX vectors. The specification defines support for a wide range of low-precision data formats essential for efficient AI processing. These formats include INT8, FP32, BF16, FP16, and several mixed-precision variants such as MX FP8, MX FP6, MX FP4, and MX INT8.
The primary function of ACE is to accelerate matrix multiplication primitives for AI workloads. This focus on compute density allows for more efficient processing of neural network data. The extensions provide a significant increase in matrix multiply performance compared to earlier standards. This architectural shift supports the growing demand for high-throughput AI inference and training on x86 hardware.
We touched on AMD and Intel Unite on AI in our earlier Amd coverage. The publication of this specification marks a concrete step toward unified AI acceleration across x86 platforms. Developers can now align their software stacks with these defined standards for future hardware compatibility.



Discussion
0 comments
Log in to join the thread with a thoughtful take, question, or correction.