Application Processing Unit - a compute chip that combines a x86 CPU, GPU and NPU integrated in the same chip.

access pattern (in relation to data in memory)#

How data is read or written in memory. E.g., data can be accessed sequentially, randomly, or other ordered patterns.


Refers to the complete Ryzen AI NPU application. It consists of the configuration for the parts of the NPU that are used for this application, software kernel executable code for each AI Engine in a compute tile, connections between tiles, instructions for data movement in the NPU. The application is contained in a .xclbin file which is a container file that includes the other configuration and executable files for components in the NPU.


Riallto Python module used to build an NPU application.


Riallto Python object that is used to defines the connections and the data movement between tiles.

compute tile#

One of the three tile types that make up the NPU. Contains the AI Engine processor and local data memory.

dataflow graph#

Abstract representation of an application that defines the flow of data between compute nodes. Can be represented graphically.

data movers#

Control logic that moves data between tiles over point to point connections. Each tile will have a component of the data mover to send or receive data. In the sending tile, the data mover master initiates the data movement. On the other end of the connection in the receiving tile, a corresponding data move slave receives the data and stores it in memory or moves it to the processor.


The graph describes the connections between tiles in the Ryzen AI NPU and the data movement between them.

IPU (Inference Processing Unit)#

Alternative name for an NPU. Special compute unit optimized for machine learning computation. Ryzen AI is the AMD brand name and is an instance of an NPU/IPU.

interface tile#

One of the three tile types that make up the NPU. Contains the data movers to move data from external system memory to tiles in the NPU, and receives data from the NPU and sends it to external system memory.

kernel (or software kernel)#

The software function that runs on an AI Engine in a Compute tile. A software kernel has source code that is compiled to an executable that will run on an AI Engine. One kernel is usually assigned to one AI Engine. More than one kernel can be assigned to, and run on a single AI Engine.

memory tile#

One of the three tile types that make up the NPU. Contains the data mover and local data memory.


The compilation tools used to build the Ryzen AI NPU application and are installed as part of Riallto.


MLIR, or Multi-Level Intermediate Representation, is a compiler infrastructure project. It provides a common intermediate representation that allows for seamless communication and transformation across different programming languages and hardware targets. Riallto compiles software kernels written in Python and C++ into MLIR which is then used by the AIEtools to build Ryzen AI applications.



MLIR-AIE is an open-source research project from the Research and Advanced Development group (RAD) at AMD. MLIR-AIE is an intermediate tool used by Riallto to compile software kernels for the AI Engines in compute tiles. It will take automatically generated MLIR and compile this into executables for the AI Engines.


This refers to memory access patterns and for the Ryzen AI, multidimensional 2D, 3D and 4D transfers are supported by memory tile data movers. Unlike linear data structures (like arrays), multidimensional data can be organized in tables, matrices, or hypercubes. Moving this type of data involves coordinating across multiple axes or indices, enabling operations that consider relationships in multiple dimensions simultaneously. This is common in machine learning algorithms.

NPU (Neural Processing Unit)#

Special compute unit optimized for machine learning computation. Also known as an Inference Processing Unit (IPU). Ryzen AI is the AMD brand name and is an instance of an NPU/IPU.


NumPy is a powerful numerical computing library for Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays. NumPy is widely used in scientific and data-related tasks for its efficiency and ease of use, allowing for fast and convenient manipulation of numerical data in Python programs.


In algorithms or parallel computing, partitioning refers to dividing a set of data into subsets based on specific criteria. This can be done for efficient processing, distribution across multiple processors, or organizing data in a way that suits a particular algorithm.

Ryzen AI#

Ryzen AI is the AMD brand name for the NPU (Inference Processing unit) in the Ryzen 7040 ‘Phoenix’ computer chips.


In a data flow architecture, a schedule refers to the plan or order in which operations or tasks are executed based on the availability of data dependencies. It determines the timing and sequence of operations in a computational graph, ensuring that each operation receives the required input data before it is executed. Scheduling is crucial for optimizing parallelism and minimizing idle time in a system, ultimately improving the overall efficiency of data flow computations.

shape (in relation to data)#

The shape of data refers to the structure and dimensions of a dataset. It describes how the data is organized in terms of rows, columns, and, more generally, its size along each dimension. For example, in a two-dimensional array, the shape might be expressed as (rows, columns). Understanding the shape of data is essential for proper manipulation, analysis, and processing, as it defines the layout and organization of information within the dataset.


A slice refers to a subset or portion of a data structure. This concept is commonly used in arrays, lists, or other sequence-like data types. Slicing allows you to extract a specific range or segment of elements from the original data structure. In languages like Python, you can use slicing notation to specify the start and end indices to create a slice of a list or array. Slices are useful for working with parts of data without modifying the original structure.


In programming, a template generally refers to a parameterized structure that can be customized for various specific instances. Templates are commonly used in programming languages to create generic classes or functions that can work with different data types without sacrificing type safety.

In Riallto templates are provided for commonly used image processing data movement patterns.


The Ryzen AI NPU is structure as an array of compute, memory and interface tiles. Tile can refer to any or all of these three types.

tiling (not to be confused with tile)#

Tiling refers to data access patterns, and describes how data is broken down into smaller chunks for more efficient processing. It optimizes memory use and enhances parallelism by working on smaller subsets, improving overall computational efficiency.


A vector typically refers to a one-dimensional array or list of elements. Vectors are used to represent and manipulate sequences of data, such as numbers, characters, or other types. In a mathematical context, vectors often denote both magnitude and direction, but in programming, a vector is a simple and versatile data structure for storing and processing ordered collections of elements.

The NPU AI Engines in compute tiles are vector processors and can process vectors (a series of elements) in parallel. For example, if you have two vectors, each of say 16 elements, a vector addition would add each element in the two vectors together in parallel. Other vector operations are supported by the AI Engines.

vectorization factor#

The vectorization factor represents the number of elements processed simultaneously in a single vector instruction. It quantifies the level of parallelism achieved when using vector instructions to operate on data. For example, if the vectorization factor is 4, it means that four elements are processed at once, improving computational efficiency by performing multiple operations in parallel.

VPU (Vector Processing Unit)#

floating-point and fixed-point datapaths for SIMD (vectorial) computation on the AI Engine processor.

Copyright© 2023 AMD, Inc
SPDX-License-Identifier: MIT