npu.runtime package

Contents

npu.runtime package#

Submodules#

npu.runtime.aie_host_utils module#

npu.runtime.aie_host_utils.print_dolphin()#

Original test-passing success dolphin. Kept in for nostalgia.

npu.runtime.apprunner module#

class npu.runtime.apprunner.AppRunner(xclbin_name, fw_sequence=None, handoff=None)#

Bases: object

This class abstracts the necessary setup steps of an NPU application and enables a simple interface with the accelerator using allocate() methods, treating buffers to the NPU as simple Numpy arrays.

xclbin_name#

Name of xclbin file

Type:

str

fw_sequence#

Name of the firmware sequence, typically same name as the xclbin file

Type:

str

handoff#

Name of the metadata handoff file, typically a .json file with the same name as the xclbin and firmware files

Type:

str

Note

This class is primarily built on top of the python bindings to XRT (Xilinx Runtime Library). You can read more about the runtime in the documentation at https://xilinx.github.io/XRT/.

property Signature#
allocate(shape, dtype='u1', cacheable=False, param=None, **kwargs)#

Allocate a new PynqBuffer object.

This API mimics the numpy ndarray constructor.

call(*kwargs)#

This function abstracts pyxrt.run(), passes pyxrt.bo objects to the pyxrt.run() function if they are recognized as PynqBuffer types, otherwise passes them as is.

display()#

Display the graph of the loaded application.

Return type:

None

dropdownrtpupdate(rtpseq, options, val)#
property metadata#
rtpsliders(filters=[], radios={})#
rtpupdate(rtpseq, val)#
rtpwidgets(widgetmeta={})#

This function automatically generates ipywidgets if this has been enabled in the metadata.

save(filename=None)#

Saves animation to a file.

Return type:

None

exception npu.runtime.apprunner.IPUAppAlreadyLoaded#

Bases: Exception

class npu.runtime.apprunner.PynqBuffer(*args, cacheable=False, bo=0, **kwargs)#

Bases: ndarray

This is a subclass of numpy.ndarray. This class is intended to be constructed using the AppRunner.allocate() method and should not be used as a standalone.

bo#

A pyxrt buffer object.

Type:

pyxrt.bo

cacheable#

Typically host buffers will not be cacheable, but instr buffers will always be.

Type:

bool

Note

It’s important to free the buffer memory after use – this can be done with the free_memory() method. The AppRunner class tracks the allocated buffers and clears the buffers automatically when the object has been deleted.

free_memory()#
sync_from_npu()#
sync_to_npu()#

npu.runtime.kernelinstance module#

class npu.runtime.kernelinstance.KernelInstance#

Bases: object

A Kernel instance object that is used when associating RTPs at runtime.

_portlist#

A list of ports associated with the kernel.

Type:

list

_tloc#

The mapped location of the kernel.

Type:

tuple[int, int]

npu.runtime.pyxrt module#

Pybind11 module for XRT

class npu.runtime.pyxrt.bo#

Bases: pybind11_object

Represents a buffer object

address(self: npu.runtime.pyxrt.bo) int#

Return the device physical address of the buffer object

cacheable = <flags.cacheable: 16777216>#
device_only = <flags.device_only: 268435456>#
class flags#

Bases: pybind11_object

Buffer object creation flags

Members:

normal

cacheable

device_only

host_only

p2p

svm

cacheable = <flags.cacheable: 16777216>#
device_only = <flags.device_only: 268435456>#
host_only = <flags.host_only: 536870912>#
property name#
normal = <flags.normal: 0>#
p2p = <flags.p2p: 1073741824>#
svm = <flags.svm: 134217728>#
property value#
host_only = <flags.host_only: 536870912>#
map(self: npu.runtime.pyxrt.bo) memoryview#

Create a byte accessible memory view of the buffer object

normal = <flags.normal: 0>#
p2p = <flags.p2p: 1073741824>#
read(self: npu.runtime.pyxrt.bo, arg0: int, arg1: int) numpy.ndarray[numpy.int8]#

Read from the buffer object requested number of bytes starting from specified offset

size(self: npu.runtime.pyxrt.bo) int#

Return the size of the buffer object

svm = <flags.svm: 134217728>#
sync(*args, **kwargs)#

Overloaded function.

  1. sync(self: npu.runtime.pyxrt.bo, arg0: npu.runtime.pyxrt.xclBOSyncDirection, arg1: int, arg2: int) -> None

Synchronize (DMA or cache flush/invalidation) the buffer in the requested direction

  1. sync(self: npu.runtime.pyxrt.bo, arg0: npu.runtime.pyxrt.xclBOSyncDirection) -> None

Sync entire buffer content in specified direction.

write(self: npu.runtime.pyxrt.bo, arg0: buffer, arg1: int) None#

Write the provided data into the buffer object starting at specified offset

class npu.runtime.pyxrt.device#

Bases: pybind11_object

Abstraction of an acceleration device

get_info(self: npu.runtime.pyxrt.device, arg0: npu.runtime.pyxrt.xrt_info_device) str#

Obtain the device properties and sensor information

get_xclbin_uuid(self: npu.runtime.pyxrt.device) npu.runtime.pyxrt.uuid#

Return the UUID object representing the xclbin loaded on the device

load_xclbin(*args, **kwargs)#

Overloaded function.

  1. load_xclbin(self: npu.runtime.pyxrt.device, arg0: str) -> npu.runtime.pyxrt.uuid

Load an xclbin given the path to the device

  1. load_xclbin(self: npu.runtime.pyxrt.device, arg0: xrt::xclbin) -> npu.runtime.pyxrt.uuid

Load the xclbin to the device

register_xclbin(self: npu.runtime.pyxrt.device, arg0: xrt::xclbin) npu.runtime.pyxrt.uuid#

Register an xclbin with the device

class npu.runtime.pyxrt.ert_cmd_state#

Bases: pybind11_object

Kernel execution status

Members:

ERT_CMD_STATE_NEW

ERT_CMD_STATE_QUEUED

ERT_CMD_STATE_COMPLETED

ERT_CMD_STATE_ERROR

ERT_CMD_STATE_ABORT

ERT_CMD_STATE_SUBMITTED

ERT_CMD_STATE_TIMEOUT

ERT_CMD_STATE_NORESPONSE

ERT_CMD_STATE_SKERROR

ERT_CMD_STATE_SKCRASHED

ERT_CMD_STATE_ABORT = <ert_cmd_state.ERT_CMD_STATE_ABORT: 6>#
ERT_CMD_STATE_COMPLETED = <ert_cmd_state.ERT_CMD_STATE_COMPLETED: 4>#
ERT_CMD_STATE_ERROR = <ert_cmd_state.ERT_CMD_STATE_ERROR: 5>#
ERT_CMD_STATE_NEW = <ert_cmd_state.ERT_CMD_STATE_NEW: 1>#
ERT_CMD_STATE_NORESPONSE = <ert_cmd_state.ERT_CMD_STATE_NORESPONSE: 9>#
ERT_CMD_STATE_QUEUED = <ert_cmd_state.ERT_CMD_STATE_QUEUED: 2>#
ERT_CMD_STATE_SKCRASHED = <ert_cmd_state.ERT_CMD_STATE_SKCRASHED: 11>#
ERT_CMD_STATE_SKERROR = <ert_cmd_state.ERT_CMD_STATE_SKERROR: 10>#
ERT_CMD_STATE_SUBMITTED = <ert_cmd_state.ERT_CMD_STATE_SUBMITTED: 7>#
ERT_CMD_STATE_TIMEOUT = <ert_cmd_state.ERT_CMD_STATE_TIMEOUT: 8>#
property name#
property value#
class npu.runtime.pyxrt.hw_context#

Bases: pybind11_object

A hardware context associates an xclbin with hardware resources.

class npu.runtime.pyxrt.kernel#

Bases: pybind11_object

Represents a set of instances matching a specified name

class cu_access_mode#

Bases: pybind11_object

Compute unit access mode

Members:

exclusive

shared

none

exclusive = <cu_access_mode.exclusive: 0>#
property name#
none = <cu_access_mode.none: 2>#
shared = <cu_access_mode.shared: 1>#
property value#
exclusive = <cu_access_mode.exclusive: 0>#
group_id(self: npu.runtime.pyxrt.kernel, arg0: int) int#

Get the memory bank group id of an kernel argument

none = <cu_access_mode.none: 2>#
shared = <cu_access_mode.shared: 1>#
class npu.runtime.pyxrt.run#

Bases: pybind11_object

Represents one execution of a kernel

add_callback(self: npu.runtime.pyxrt.run, arg0: npu.runtime.pyxrt.ert_cmd_state, arg1: std::function<void __cdecl(void const * __ptr64, ert_cmd_state, void * __ptr64)>, arg2: capsule) None#

Add a callback function for run state

set_arg(*args, **kwargs)#

Overloaded function.

  1. set_arg(self: npu.runtime.pyxrt.run, arg0: int, arg1: xrt::bo) -> None

Set a specific kernel global argument for a run

  1. set_arg(self: npu.runtime.pyxrt.run, arg0: int, arg1: int) -> None

Set a specific kernel scalar argument for this run

start(self: npu.runtime.pyxrt.run) None#

Start one execution of a run

state(self: npu.runtime.pyxrt.run) npu.runtime.pyxrt.ert_cmd_state#

Check the current state of a run object

wait(*args, **kwargs)#

Overloaded function.

  1. wait(self: npu.runtime.pyxrt.run) -> npu.runtime.pyxrt.ert_cmd_state

Wait for the run to complete

  1. wait(self: npu.runtime.pyxrt.run, arg0: int) -> npu.runtime.pyxrt.ert_cmd_state

Wait for the specified milliseconds for the run to complete

class npu.runtime.pyxrt.uuid#

Bases: pybind11_object

XRT UUID object to identify a compiled xclbin binary

to_string(self: npu.runtime.pyxrt.uuid) str#

Convert XRT UUID object to string

class npu.runtime.pyxrt.xclBOSyncDirection#

Bases: pybind11_object

DMA flags used with DMA API

Members:

XCL_BO_SYNC_BO_TO_DEVICE

XCL_BO_SYNC_BO_FROM_DEVICE

XCL_BO_SYNC_BO_GMIO_TO_AIE

XCL_BO_SYNC_BO_AIE_TO_GMIO

XCL_BO_SYNC_BO_AIE_TO_GMIO = <xclBOSyncDirection.XCL_BO_SYNC_BO_AIE_TO_GMIO: 3>#
XCL_BO_SYNC_BO_FROM_DEVICE = <xclBOSyncDirection.XCL_BO_SYNC_BO_FROM_DEVICE: 1>#
XCL_BO_SYNC_BO_GMIO_TO_AIE = <xclBOSyncDirection.XCL_BO_SYNC_BO_GMIO_TO_AIE: 2>#
XCL_BO_SYNC_BO_TO_DEVICE = <xclBOSyncDirection.XCL_BO_SYNC_BO_TO_DEVICE: 0>#
property name#
property value#
class npu.runtime.pyxrt.xclbin#

Bases: pybind11_object

Represents an xclbin and provides APIs to access meta data

get_axlf(self: npu.runtime.pyxrt.xclbin) axlf#

Get the axlf data of the xclbin

get_kernels(self: npu.runtime.pyxrt.xclbin) List[npu.runtime.pyxrt.xclbin.xclbinkernel]#

Get list of kernels from xclbin

get_mems(self: npu.runtime.pyxrt.xclbin) List[npu.runtime.pyxrt.xclbin.xclbinmem]#

Get list of memory objects

get_uuid(self: npu.runtime.pyxrt.xclbin) npu.runtime.pyxrt.uuid#

Get the uuid of the xclbin

get_xsa_name(self: npu.runtime.pyxrt.xclbin) str#

Get Xilinx Support Archive (XSA) name of xclbin

class xclbinip#

Bases: pybind11_object

get_name(self: npu.runtime.pyxrt.xclbin.xclbinip) str#
class xclbinkernel#

Bases: pybind11_object

Represents a kernel in an xclbin

get_name(self: npu.runtime.pyxrt.xclbin.xclbinkernel) str#

Get kernel name

get_num_args(self: npu.runtime.pyxrt.xclbin.xclbinkernel) int#

Number of arguments

class xclbinmem#

Bases: pybind11_object

Represents a physical device memory bank

get_base_address(self: npu.runtime.pyxrt.xclbin.xclbinmem) int#

Get the base address of the memory bank

get_index(self: npu.runtime.pyxrt.xclbin.xclbinmem) int#

Get the index of the memory

get_size_kb(self: npu.runtime.pyxrt.xclbin.xclbinmem) int#

Get the size of the memory in KB

get_tag(self: npu.runtime.pyxrt.xclbin.xclbinmem) str#

Get tag name

get_used(self: npu.runtime.pyxrt.xclbin.xclbinmem) bool#

Get used status of the memory

class npu.runtime.pyxrt.xclbinip_vector#

Bases: pybind11_object

append(self: npu.runtime.pyxrt.xclbinip_vector, x: npu.runtime.pyxrt.xclbin.xclbinip) None#

Add an item to the end of the list

clear(self: npu.runtime.pyxrt.xclbinip_vector) None#

Clear the contents

extend(*args, **kwargs)#

Overloaded function.

  1. extend(self: npu.runtime.pyxrt.xclbinip_vector, L: npu.runtime.pyxrt.xclbinip_vector) -> None

Extend the list by appending all the items in the given list

  1. extend(self: npu.runtime.pyxrt.xclbinip_vector, L: Iterable) -> None

Extend the list by appending all the items in the given list

insert(self: npu.runtime.pyxrt.xclbinip_vector, i: int, x: npu.runtime.pyxrt.xclbin.xclbinip) None#

Insert an item at a given position.

pop(*args, **kwargs)#

Overloaded function.

  1. pop(self: npu.runtime.pyxrt.xclbinip_vector) -> npu.runtime.pyxrt.xclbin.xclbinip

Remove and return the last item

  1. pop(self: npu.runtime.pyxrt.xclbinip_vector, i: int) -> npu.runtime.pyxrt.xclbin.xclbinip

Remove and return the item at index i

class npu.runtime.pyxrt.xclbinkernel_vector#

Bases: pybind11_object

append(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel], x: npu.runtime.pyxrt.xclbin.xclbinkernel) None#

Add an item to the end of the list

clear(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel]) None#

Clear the contents

extend(*args, **kwargs)#

Overloaded function.

  1. extend(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel], L: List[npu.runtime.pyxrt.xclbin.xclbinkernel]) -> None

Extend the list by appending all the items in the given list

  1. extend(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel], L: Iterable) -> None

Extend the list by appending all the items in the given list

insert(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel], i: int, x: npu.runtime.pyxrt.xclbin.xclbinkernel) None#

Insert an item at a given position.

pop(*args, **kwargs)#

Overloaded function.

  1. pop(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel]) -> npu.runtime.pyxrt.xclbin.xclbinkernel

Remove and return the last item

  1. pop(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel], i: int) -> npu.runtime.pyxrt.xclbin.xclbinkernel

Remove and return the item at index i

class npu.runtime.pyxrt.xclbinmem_vector#

Bases: pybind11_object

append(self: List[npu.runtime.pyxrt.xclbin.xclbinmem], x: npu.runtime.pyxrt.xclbin.xclbinmem) None#

Add an item to the end of the list

clear(self: List[npu.runtime.pyxrt.xclbin.xclbinmem]) None#

Clear the contents

extend(*args, **kwargs)#

Overloaded function.

  1. extend(self: List[npu.runtime.pyxrt.xclbin.xclbinmem], L: List[npu.runtime.pyxrt.xclbin.xclbinmem]) -> None

Extend the list by appending all the items in the given list

  1. extend(self: List[npu.runtime.pyxrt.xclbin.xclbinmem], L: Iterable) -> None

Extend the list by appending all the items in the given list

insert(self: List[npu.runtime.pyxrt.xclbin.xclbinmem], i: int, x: npu.runtime.pyxrt.xclbin.xclbinmem) None#

Insert an item at a given position.

pop(*args, **kwargs)#

Overloaded function.

  1. pop(self: List[npu.runtime.pyxrt.xclbin.xclbinmem]) -> npu.runtime.pyxrt.xclbin.xclbinmem

Remove and return the last item

  1. pop(self: List[npu.runtime.pyxrt.xclbin.xclbinmem], i: int) -> npu.runtime.pyxrt.xclbin.xclbinmem

Remove and return the item at index i

class npu.runtime.pyxrt.xrt_info_device#

Bases: pybind11_object

Device feature and sensor information

Members:

bdf

interface_uuid

kdma

max_clock_frequency_mhz

m2m

name

nodma

offline

electrical

thermal

mechanical

memory

platform

pcie_info

host

dynamic_regions

vmr

bdf = <xrt_info_device.bdf: 0>#
dynamic_regions = <xrt_info_device.dynamic_regions: 17>#
electrical = <xrt_info_device.electrical: 8>#
host = <xrt_info_device.host: 14>#
interface_uuid = <xrt_info_device.interface_uuid: 1>#
kdma = <xrt_info_device.kdma: 2>#
m2m = <xrt_info_device.m2m: 4>#
max_clock_frequency_mhz = <xrt_info_device.max_clock_frequency_mhz: 3>#
mechanical = <xrt_info_device.mechanical: 10>#
memory = <xrt_info_device.memory: 11>#
name = <xrt_info_device.name: 5>#
nodma = <xrt_info_device.nodma: 6>#
offline = <xrt_info_device.offline: 7>#
pcie_info = <xrt_info_device.pcie_info: 13>#
platform = <xrt_info_device.platform: 12>#
thermal = <xrt_info_device.thermal: 9>#
property value#
vmr = <xrt_info_device.vmr: 18>#

npu.runtime.sequence module#

class npu.runtime.sequence.Coord(row: int, col: int)#

Bases: NamedTuple

A coordinate of a location within the array (CT/MT/IT).

col: int#

Alias for field number 1

row: int#

Alias for field number 0

class npu.runtime.sequence.Operation(opcode=0, words=1, bdId=0, coords=(0, 0), config=<factory>)#

Bases: object

A dataclass that defines an operation within a sequence.

opcode#

The 16bit binary opcode for this operation.

Type:

int

words#

The number of 32-bit words this operation consumes from the sequence.

Type:

int

bdId#

The bdId (if there is one) associated with this operation.

Type:

int

coords#

The Coordinates that this operation applies to (CT/MT/IT)

Type:

Coords

config#

A list of words that make up this entire sequence.

Type:

List[ctypes.c_uint32]

bdId: int = 0#
property bin: List[c_ulong]#

Returns the binary form of this operation.

config: List[c_ubyte]#
coords: Coord = (0, 0)#
opcode: int = 0#
property str: str#

Renders a string to describe this operation.

words: int = 1#
npu.runtime.sequence.OperationFactory(words)#

Peel off the next operation in the sequence words and produce an instance of it. Currently only exposes operations relevant to RTP writes.

Return type:

(Operation, List[c_ulong])

npu.runtime.sequence.ParseBDId(word)#

Parses the BD ident for the op-code word.

Return type:

int

npu.runtime.sequence.ParseOpCodeString(word)#

From a word containing an opcode extract the opecode string name.

Return type:

str

npu.runtime.sequence.ParseTileCoords(word)#

Gets the column coord from the instruction.

Return type:

Coord

class npu.runtime.sequence.RTPOp(opcode=2, words=3, bdId=0, coords=(0, 0), config=<factory>, addr=0, value=0, rtpidx=0)#

Bases: Operation

RTP operation dataclass for setting an RTP value. Inherits from Operation.

addr#

A 32-bit relative address for the location of this RTP.

Type:

ctypes.c_uint32

value#

A 32-bit value for the RTP.

Type:

ctypes.c_uint32

rtpidx#

The index for this RTP in the kernel argument (associated on first parse of sequence)

Type:

int

addr: c_ulong = 0#
property bin: List[c_ulong]#

Returns the binary form of this operation.

opcode: int = 2#
rtpidx: int = 0#
property str: str#

Renders a string to describe this operation.

value: c_ulong = 0#
words: int = 3#
class npu.runtime.sequence.Sequence(binseq_file, first_parse=True)#

Bases: object

Performs a minimal parsing of the sequence to determine RTP writes and expose them.

If this is the first time that the sequence is being parsed, set by an optional parameter, then the value of any RTP parsed will be used to set it’s rtpindex and build the mlir_rtps dictionary.

_in_file#

The input sequence file.

Type:

str

_str_contents#

The contents of the sequence file in string form.

Type:

str

config_words#

A list of sequence words in binary form.

Type:

list[ctypes.c_uint32]

operations#

A list of operation dataclasses that have been parsed from the binary sequence.

Type:

list[Operations]

mlir_rtps#

A dictionary of RTPs that have been parsed and can be set at runtime.

Type:

dict

property bin: List[c_ulong]#

Returns the complete binary for the sequence

property buffer: array#

Renders a new np.array for the sequence that can be passed into the device

property rtp_words: List[int]#

From the rtp array pack them into 32bit words

txt(filename, annotated=False)#

Dumps the instructions to a txt file. If annotated is set to True then print a descriptive string for each operation next to the operation.

Return type:

None

npu.runtime.sequence.createOpBin(opcode, coords, bdId)#

Takes the opcode/coordiates/BdId and constructs a 32-bit binary op word.

npu.runtime.sequence.isCT(coord)#

Accepts coords, returns true if location is a compute tile.

Return type:

bool

npu.runtime.sequence.isIT(coord)#

Accepts coords, returns true if location is an interface tile.

Return type:

bool

npu.runtime.sequence.isMT(coord)#

Accepts coords, returns true if location is a memory tile.

Return type:

bool

npu.runtime.sequence.parse_word(s)#

Parsed either an int string or hex string or raises an error.

Return type:

int

Module contents#

Runtime#

The NPU submodule npu.runtime contains classes and functions to run custom applications on the NPU device. Here APIs are provided to load custom xclbins, allocate numpy arrays to NPU-compatible buffers and read processed data out.

Example usage of the AppRunner class and allocate methods to process your data.

import numpy as np from npu.runtime import AppRunner

# Register the xclbin and program the NPU app = AppRunner(“myapp.xclbin”)

# Generate random python data and allocate input # and output buffers test_data = np.random.randint(0, 255, 256, dtype=np.uint8) bo_in = app.allocate(shape=(256,), dtype=np.uint8) bo_out = app.allocate(shape=(256,), dtype=np.uint8)

# Copy input data into NPU memory bo_in[:] = test_data bo_in.sync_to_npu()

# Execute the application app.call(bo_in, bo_out)

# Update the output buffer with the results bo_out.sync_from_npu()

# Print the output print(np.array(bo_out))

# Unload the application, free resources del app