AMD and Intel Define ACE x86 Extensions for AI Matrix Math

AMD and Intel publish the ACE specification for x86, defining new matrix multiplication primitives and low- precision formats to boost AI compute density.

AMD and Intel Define ACE x86 Extensions for AI Matrix Math

and have published the latest specification for ACE, a new set of x86 extensions designed to close the performance gap in artificial intelligence workloads. This joint effort directly impacts developers and hardware designers building next-generation processors for neural networks and large language models. The update provides a standardized path for accelerating matrix multiplication, which is the core computational task for modern AI applications.

AMD and Intel AI collaboration logo
AMD and Intel have jointly published the specification for ACE.

Joint specification adds dedicated registers and low-precision formats

The ACE initiative operates under the x86 Ecosystem Advisory Group alongside other key projects like FRED, AVX10, and ChkTag. These extensions aim to augment existing AVX and scalar code with specialized capabilities for AI tasks. By defining these primitives at the architecture level, the industry seeks to improve scalability and energy efficiency beyond what current SIMD extensions can deliver.

  • Extension Name: ACE (AI Compute Extensions)
  • Supported Data Formats: INT8, INT32, FP32, BF16, FP16, E8M0, FP8, MX FP8, MX FP6, MX FP4, MX INT8
  • Primary Function: Matrix multiplication primitives and low-precision format conversion
  • Register State: Tile and block scale registers

ACE introduces dedicated tile and block scale registers to integrate seamlessly with AVX vectors. The specification defines support for a wide range of low-precision data formats essential for efficient AI processing. These formats include INT8, FP32, BF16, FP16, and several mixed-precision variants such as MX FP8, MX FP6, MX FP4, and MX INT8.

The primary function of ACE is to accelerate matrix multiplication primitives for AI workloads. This focus on compute density allows for more efficient processing of neural network data. The extensions provide a significant increase in matrix multiply performance compared to earlier standards. This architectural shift supports the growing demand for high-throughput AI inference and training on x86 hardware.

We touched on AMD and Intel Unite on AI in our earlier Amd coverage. The publication of this specification marks a concrete step toward unified AI acceleration across x86 platforms. Developers can now align their software stacks with these defined standards for future hardware compatibility.

Discussion

0 comments

Log in to join the thread with a thoughtful take, question, or correction.

Add to the discussion