npu.runtime package#
Submodules#
npu.runtime.aie_host_utils module#
- npu.runtime.aie_host_utils.print_dolphin()#
Original test-passing success dolphin. Kept in for nostalgia.
npu.runtime.apprunner module#
- class npu.runtime.apprunner.AppRunner(xclbin_name, fw_sequence=None, handoff=None)#
Bases:
object
This class abstracts the necessary setup steps of an NPU application and enables a simple interface with the accelerator using allocate() methods, treating buffers to the NPU as simple Numpy arrays.
- xclbin_name#
Name of xclbin file
- Type:
str
- fw_sequence#
Name of the firmware sequence, typically same name as the xclbin file
- Type:
str
- handoff#
Name of the metadata handoff file, typically a .json file with the same name as the xclbin and firmware files
- Type:
str
Note
This class is primarily built on top of the python bindings to XRT (Xilinx Runtime Library). You can read more about the runtime in the documentation at https://xilinx.github.io/XRT/.
- property Signature#
- allocate(shape, dtype='u1', cacheable=False, param=None, **kwargs)#
Allocate a new PynqBuffer object.
This API mimics the numpy ndarray constructor.
- call(*kwargs)#
This function abstracts pyxrt.run(), passes pyxrt.bo objects to the pyxrt.run() function if they are recognized as PynqBuffer types, otherwise passes them as is.
- display()#
Display the graph of the loaded application.
- Return type:
None
- dropdownrtpupdate(rtpseq, options, val)#
- property metadata#
- rtpsliders(filters=[], radios={})#
- rtpupdate(rtpseq, val)#
- rtpwidgets(widgetmeta={})#
This function automatically generates ipywidgets if this has been enabled in the metadata.
- save(filename=None)#
Saves animation to a file.
- Return type:
None
- exception npu.runtime.apprunner.IPUAppAlreadyLoaded#
Bases:
Exception
- class npu.runtime.apprunner.PynqBuffer(*args, cacheable=False, bo=0, **kwargs)#
Bases:
ndarray
This is a subclass of numpy.ndarray. This class is intended to be constructed using the AppRunner.allocate() method and should not be used as a standalone.
- cacheable#
Typically host buffers will not be cacheable, but instr buffers will always be.
- Type:
bool
Note
It’s important to free the buffer memory after use – this can be done with the free_memory() method. The AppRunner class tracks the allocated buffers and clears the buffers automatically when the object has been deleted.
- free_memory()#
- sync_from_npu()#
- sync_to_npu()#
npu.runtime.kernelinstance module#
npu.runtime.pyxrt module#
Pybind11 module for XRT
- class npu.runtime.pyxrt.bo#
Bases:
pybind11_object
Represents a buffer object
- address(self: npu.runtime.pyxrt.bo) int #
Return the device physical address of the buffer object
- cacheable = <flags.cacheable: 16777216>#
- device_only = <flags.device_only: 268435456>#
- class flags#
Bases:
pybind11_object
Buffer object creation flags
Members:
normal
cacheable
device_only
host_only
p2p
svm
- cacheable = <flags.cacheable: 16777216>#
- device_only = <flags.device_only: 268435456>#
- host_only = <flags.host_only: 536870912>#
- property name#
- normal = <flags.normal: 0>#
- p2p = <flags.p2p: 1073741824>#
- svm = <flags.svm: 134217728>#
- property value#
- host_only = <flags.host_only: 536870912>#
- map(self: npu.runtime.pyxrt.bo) memoryview #
Create a byte accessible memory view of the buffer object
- normal = <flags.normal: 0>#
- p2p = <flags.p2p: 1073741824>#
- read(self: npu.runtime.pyxrt.bo, arg0: int, arg1: int) numpy.ndarray[numpy.int8] #
Read from the buffer object requested number of bytes starting from specified offset
- size(self: npu.runtime.pyxrt.bo) int #
Return the size of the buffer object
- svm = <flags.svm: 134217728>#
- sync(*args, **kwargs)#
Overloaded function.
sync(self: npu.runtime.pyxrt.bo, arg0: npu.runtime.pyxrt.xclBOSyncDirection, arg1: int, arg2: int) -> None
Synchronize (DMA or cache flush/invalidation) the buffer in the requested direction
sync(self: npu.runtime.pyxrt.bo, arg0: npu.runtime.pyxrt.xclBOSyncDirection) -> None
Sync entire buffer content in specified direction.
- write(self: npu.runtime.pyxrt.bo, arg0: buffer, arg1: int) None #
Write the provided data into the buffer object starting at specified offset
- class npu.runtime.pyxrt.device#
Bases:
pybind11_object
Abstraction of an acceleration device
- get_info(self: npu.runtime.pyxrt.device, arg0: npu.runtime.pyxrt.xrt_info_device) str #
Obtain the device properties and sensor information
- get_xclbin_uuid(self: npu.runtime.pyxrt.device) npu.runtime.pyxrt.uuid #
Return the UUID object representing the xclbin loaded on the device
- load_xclbin(*args, **kwargs)#
Overloaded function.
load_xclbin(self: npu.runtime.pyxrt.device, arg0: str) -> npu.runtime.pyxrt.uuid
Load an xclbin given the path to the device
load_xclbin(self: npu.runtime.pyxrt.device, arg0: xrt::xclbin) -> npu.runtime.pyxrt.uuid
Load the xclbin to the device
- register_xclbin(self: npu.runtime.pyxrt.device, arg0: xrt::xclbin) npu.runtime.pyxrt.uuid #
Register an xclbin with the device
- class npu.runtime.pyxrt.ert_cmd_state#
Bases:
pybind11_object
Kernel execution status
Members:
ERT_CMD_STATE_NEW
ERT_CMD_STATE_QUEUED
ERT_CMD_STATE_COMPLETED
ERT_CMD_STATE_ERROR
ERT_CMD_STATE_ABORT
ERT_CMD_STATE_SUBMITTED
ERT_CMD_STATE_TIMEOUT
ERT_CMD_STATE_NORESPONSE
ERT_CMD_STATE_SKERROR
ERT_CMD_STATE_SKCRASHED
- ERT_CMD_STATE_ABORT = <ert_cmd_state.ERT_CMD_STATE_ABORT: 6>#
- ERT_CMD_STATE_COMPLETED = <ert_cmd_state.ERT_CMD_STATE_COMPLETED: 4>#
- ERT_CMD_STATE_ERROR = <ert_cmd_state.ERT_CMD_STATE_ERROR: 5>#
- ERT_CMD_STATE_NEW = <ert_cmd_state.ERT_CMD_STATE_NEW: 1>#
- ERT_CMD_STATE_NORESPONSE = <ert_cmd_state.ERT_CMD_STATE_NORESPONSE: 9>#
- ERT_CMD_STATE_QUEUED = <ert_cmd_state.ERT_CMD_STATE_QUEUED: 2>#
- ERT_CMD_STATE_SKCRASHED = <ert_cmd_state.ERT_CMD_STATE_SKCRASHED: 11>#
- ERT_CMD_STATE_SKERROR = <ert_cmd_state.ERT_CMD_STATE_SKERROR: 10>#
- ERT_CMD_STATE_SUBMITTED = <ert_cmd_state.ERT_CMD_STATE_SUBMITTED: 7>#
- ERT_CMD_STATE_TIMEOUT = <ert_cmd_state.ERT_CMD_STATE_TIMEOUT: 8>#
- property name#
- property value#
- class npu.runtime.pyxrt.hw_context#
Bases:
pybind11_object
A hardware context associates an xclbin with hardware resources.
- class npu.runtime.pyxrt.kernel#
Bases:
pybind11_object
Represents a set of instances matching a specified name
- class cu_access_mode#
Bases:
pybind11_object
Compute unit access mode
Members:
exclusive
shared
none
- exclusive = <cu_access_mode.exclusive: 0>#
- property name#
- none = <cu_access_mode.none: 2>#
- property value#
- exclusive = <cu_access_mode.exclusive: 0>#
- group_id(self: npu.runtime.pyxrt.kernel, arg0: int) int #
Get the memory bank group id of an kernel argument
- none = <cu_access_mode.none: 2>#
- class npu.runtime.pyxrt.run#
Bases:
pybind11_object
Represents one execution of a kernel
- add_callback(self: npu.runtime.pyxrt.run, arg0: npu.runtime.pyxrt.ert_cmd_state, arg1: std::function<void __cdecl(void const * __ptr64, ert_cmd_state, void * __ptr64)>, arg2: capsule) None #
Add a callback function for run state
- set_arg(*args, **kwargs)#
Overloaded function.
set_arg(self: npu.runtime.pyxrt.run, arg0: int, arg1: xrt::bo) -> None
Set a specific kernel global argument for a run
set_arg(self: npu.runtime.pyxrt.run, arg0: int, arg1: int) -> None
Set a specific kernel scalar argument for this run
- start(self: npu.runtime.pyxrt.run) None #
Start one execution of a run
- state(self: npu.runtime.pyxrt.run) npu.runtime.pyxrt.ert_cmd_state #
Check the current state of a run object
- wait(*args, **kwargs)#
Overloaded function.
wait(self: npu.runtime.pyxrt.run) -> npu.runtime.pyxrt.ert_cmd_state
Wait for the run to complete
wait(self: npu.runtime.pyxrt.run, arg0: int) -> npu.runtime.pyxrt.ert_cmd_state
Wait for the specified milliseconds for the run to complete
- class npu.runtime.pyxrt.uuid#
Bases:
pybind11_object
XRT UUID object to identify a compiled xclbin binary
- to_string(self: npu.runtime.pyxrt.uuid) str #
Convert XRT UUID object to string
- class npu.runtime.pyxrt.xclBOSyncDirection#
Bases:
pybind11_object
DMA flags used with DMA API
Members:
XCL_BO_SYNC_BO_TO_DEVICE
XCL_BO_SYNC_BO_FROM_DEVICE
XCL_BO_SYNC_BO_GMIO_TO_AIE
XCL_BO_SYNC_BO_AIE_TO_GMIO
- XCL_BO_SYNC_BO_AIE_TO_GMIO = <xclBOSyncDirection.XCL_BO_SYNC_BO_AIE_TO_GMIO: 3>#
- XCL_BO_SYNC_BO_FROM_DEVICE = <xclBOSyncDirection.XCL_BO_SYNC_BO_FROM_DEVICE: 1>#
- XCL_BO_SYNC_BO_GMIO_TO_AIE = <xclBOSyncDirection.XCL_BO_SYNC_BO_GMIO_TO_AIE: 2>#
- XCL_BO_SYNC_BO_TO_DEVICE = <xclBOSyncDirection.XCL_BO_SYNC_BO_TO_DEVICE: 0>#
- property name#
- property value#
- class npu.runtime.pyxrt.xclbin#
Bases:
pybind11_object
Represents an xclbin and provides APIs to access meta data
- get_axlf(self: npu.runtime.pyxrt.xclbin) axlf #
Get the axlf data of the xclbin
- get_kernels(self: npu.runtime.pyxrt.xclbin) List[npu.runtime.pyxrt.xclbin.xclbinkernel] #
Get list of kernels from xclbin
- get_mems(self: npu.runtime.pyxrt.xclbin) List[npu.runtime.pyxrt.xclbin.xclbinmem] #
Get list of memory objects
- get_uuid(self: npu.runtime.pyxrt.xclbin) npu.runtime.pyxrt.uuid #
Get the uuid of the xclbin
- get_xsa_name(self: npu.runtime.pyxrt.xclbin) str #
Get Xilinx Support Archive (XSA) name of xclbin
- class xclbinip#
Bases:
pybind11_object
- get_name(self: npu.runtime.pyxrt.xclbin.xclbinip) str #
- class xclbinkernel#
Bases:
pybind11_object
Represents a kernel in an xclbin
- get_name(self: npu.runtime.pyxrt.xclbin.xclbinkernel) str #
Get kernel name
- get_num_args(self: npu.runtime.pyxrt.xclbin.xclbinkernel) int #
Number of arguments
- class xclbinmem#
Bases:
pybind11_object
Represents a physical device memory bank
- get_base_address(self: npu.runtime.pyxrt.xclbin.xclbinmem) int #
Get the base address of the memory bank
- get_index(self: npu.runtime.pyxrt.xclbin.xclbinmem) int #
Get the index of the memory
- get_size_kb(self: npu.runtime.pyxrt.xclbin.xclbinmem) int #
Get the size of the memory in KB
- get_tag(self: npu.runtime.pyxrt.xclbin.xclbinmem) str #
Get tag name
- get_used(self: npu.runtime.pyxrt.xclbin.xclbinmem) bool #
Get used status of the memory
- class npu.runtime.pyxrt.xclbinip_vector#
Bases:
pybind11_object
- append(self: npu.runtime.pyxrt.xclbinip_vector, x: npu.runtime.pyxrt.xclbin.xclbinip) None #
Add an item to the end of the list
- clear(self: npu.runtime.pyxrt.xclbinip_vector) None #
Clear the contents
- extend(*args, **kwargs)#
Overloaded function.
extend(self: npu.runtime.pyxrt.xclbinip_vector, L: npu.runtime.pyxrt.xclbinip_vector) -> None
Extend the list by appending all the items in the given list
extend(self: npu.runtime.pyxrt.xclbinip_vector, L: Iterable) -> None
Extend the list by appending all the items in the given list
- insert(self: npu.runtime.pyxrt.xclbinip_vector, i: int, x: npu.runtime.pyxrt.xclbin.xclbinip) None #
Insert an item at a given position.
- pop(*args, **kwargs)#
Overloaded function.
pop(self: npu.runtime.pyxrt.xclbinip_vector) -> npu.runtime.pyxrt.xclbin.xclbinip
Remove and return the last item
pop(self: npu.runtime.pyxrt.xclbinip_vector, i: int) -> npu.runtime.pyxrt.xclbin.xclbinip
Remove and return the item at index
i
- class npu.runtime.pyxrt.xclbinkernel_vector#
Bases:
pybind11_object
- append(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel], x: npu.runtime.pyxrt.xclbin.xclbinkernel) None #
Add an item to the end of the list
- clear(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel]) None #
Clear the contents
- extend(*args, **kwargs)#
Overloaded function.
extend(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel], L: List[npu.runtime.pyxrt.xclbin.xclbinkernel]) -> None
Extend the list by appending all the items in the given list
extend(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel], L: Iterable) -> None
Extend the list by appending all the items in the given list
- insert(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel], i: int, x: npu.runtime.pyxrt.xclbin.xclbinkernel) None #
Insert an item at a given position.
- pop(*args, **kwargs)#
Overloaded function.
pop(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel]) -> npu.runtime.pyxrt.xclbin.xclbinkernel
Remove and return the last item
pop(self: List[npu.runtime.pyxrt.xclbin.xclbinkernel], i: int) -> npu.runtime.pyxrt.xclbin.xclbinkernel
Remove and return the item at index
i
- class npu.runtime.pyxrt.xclbinmem_vector#
Bases:
pybind11_object
- append(self: List[npu.runtime.pyxrt.xclbin.xclbinmem], x: npu.runtime.pyxrt.xclbin.xclbinmem) None #
Add an item to the end of the list
- clear(self: List[npu.runtime.pyxrt.xclbin.xclbinmem]) None #
Clear the contents
- extend(*args, **kwargs)#
Overloaded function.
extend(self: List[npu.runtime.pyxrt.xclbin.xclbinmem], L: List[npu.runtime.pyxrt.xclbin.xclbinmem]) -> None
Extend the list by appending all the items in the given list
extend(self: List[npu.runtime.pyxrt.xclbin.xclbinmem], L: Iterable) -> None
Extend the list by appending all the items in the given list
- insert(self: List[npu.runtime.pyxrt.xclbin.xclbinmem], i: int, x: npu.runtime.pyxrt.xclbin.xclbinmem) None #
Insert an item at a given position.
- pop(*args, **kwargs)#
Overloaded function.
pop(self: List[npu.runtime.pyxrt.xclbin.xclbinmem]) -> npu.runtime.pyxrt.xclbin.xclbinmem
Remove and return the last item
pop(self: List[npu.runtime.pyxrt.xclbin.xclbinmem], i: int) -> npu.runtime.pyxrt.xclbin.xclbinmem
Remove and return the item at index
i
- class npu.runtime.pyxrt.xrt_info_device#
Bases:
pybind11_object
Device feature and sensor information
Members:
bdf
interface_uuid
kdma
max_clock_frequency_mhz
m2m
name
nodma
offline
electrical
thermal
mechanical
memory
platform
pcie_info
host
dynamic_regions
vmr
- bdf = <xrt_info_device.bdf: 0>#
- dynamic_regions = <xrt_info_device.dynamic_regions: 17>#
- electrical = <xrt_info_device.electrical: 8>#
- host = <xrt_info_device.host: 14>#
- interface_uuid = <xrt_info_device.interface_uuid: 1>#
- kdma = <xrt_info_device.kdma: 2>#
- m2m = <xrt_info_device.m2m: 4>#
- max_clock_frequency_mhz = <xrt_info_device.max_clock_frequency_mhz: 3>#
- mechanical = <xrt_info_device.mechanical: 10>#
- memory = <xrt_info_device.memory: 11>#
- name = <xrt_info_device.name: 5>#
- nodma = <xrt_info_device.nodma: 6>#
- offline = <xrt_info_device.offline: 7>#
- pcie_info = <xrt_info_device.pcie_info: 13>#
- platform = <xrt_info_device.platform: 12>#
- thermal = <xrt_info_device.thermal: 9>#
- property value#
- vmr = <xrt_info_device.vmr: 18>#
npu.runtime.sequence module#
- class npu.runtime.sequence.Coord(row: int, col: int)#
Bases:
NamedTuple
A coordinate of a location within the array (CT/MT/IT).
-
col:
int
# Alias for field number 1
-
row:
int
# Alias for field number 0
-
col:
- class npu.runtime.sequence.Operation(opcode=0, words=1, bdId=0, coords=(0, 0), config=<factory>)#
Bases:
object
A dataclass that defines an operation within a sequence.
- opcode#
The 16bit binary opcode for this operation.
- Type:
int
- words#
The number of 32-bit words this operation consumes from the sequence.
- Type:
int
- bdId#
The bdId (if there is one) associated with this operation.
- Type:
int
- coords#
The Coordinates that this operation applies to (CT/MT/IT)
- Type:
Coords
- config#
A list of words that make up this entire sequence.
- Type:
List[ctypes.c_uint32]
-
bdId:
int
= 0#
- property bin: List[c_ulong]#
Returns the binary form of this operation.
-
config:
List
[c_ubyte
]#
-
opcode:
int
= 0#
- property str: str#
Renders a string to describe this operation.
-
words:
int
= 1#
- npu.runtime.sequence.OperationFactory(words)#
Peel off the next operation in the sequence words and produce an instance of it. Currently only exposes operations relevant to RTP writes.
- Return type:
(
Operation
,List
[c_ulong
])
- npu.runtime.sequence.ParseBDId(word)#
Parses the BD ident for the op-code word.
- Return type:
int
- npu.runtime.sequence.ParseOpCodeString(word)#
From a word containing an opcode extract the opecode string name.
- Return type:
str
- npu.runtime.sequence.ParseTileCoords(word)#
Gets the column coord from the instruction.
- Return type:
- class npu.runtime.sequence.RTPOp(opcode=2, words=3, bdId=0, coords=(0, 0), config=<factory>, addr=0, value=0, rtpidx=0)#
Bases:
Operation
RTP operation dataclass for setting an RTP value. Inherits from Operation.
- addr#
A 32-bit relative address for the location of this RTP.
- Type:
ctypes.c_uint32
- value#
A 32-bit value for the RTP.
- Type:
ctypes.c_uint32
- rtpidx#
The index for this RTP in the kernel argument (associated on first parse of sequence)
- Type:
int
-
addr:
c_ulong
= 0#
- property bin: List[c_ulong]#
Returns the binary form of this operation.
-
opcode:
int
= 2#
-
rtpidx:
int
= 0#
- property str: str#
Renders a string to describe this operation.
-
value:
c_ulong
= 0#
-
words:
int
= 3#
- class npu.runtime.sequence.Sequence(binseq_file, first_parse=True)#
Bases:
object
- Performs a minimal parsing of the sequence to determine RTP writes and expose them.
If this is the first time that the sequence is being parsed, set by an optional parameter, then the value of any RTP parsed will be used to set it’s rtpindex and build the mlir_rtps dictionary.
- _in_file#
The input sequence file.
- Type:
str
- _str_contents#
The contents of the sequence file in string form.
- Type:
str
- config_words#
A list of sequence words in binary form.
- Type:
list[ctypes.c_uint32]
- operations#
A list of operation dataclasses that have been parsed from the binary sequence.
- Type:
list[Operations]
- mlir_rtps#
A dictionary of RTPs that have been parsed and can be set at runtime.
- Type:
dict
- property bin: List[c_ulong]#
Returns the complete binary for the sequence
- property buffer: array#
Renders a new np.array for the sequence that can be passed into the device
- property rtp_words: List[int]#
From the rtp array pack them into 32bit words
- txt(filename, annotated=False)#
Dumps the instructions to a txt file. If annotated is set to True then print a descriptive string for each operation next to the operation.
- Return type:
None
- npu.runtime.sequence.createOpBin(opcode, coords, bdId)#
Takes the opcode/coordiates/BdId and constructs a 32-bit binary op word.
- npu.runtime.sequence.isCT(coord)#
Accepts coords, returns true if location is a compute tile.
- Return type:
bool
- npu.runtime.sequence.isIT(coord)#
Accepts coords, returns true if location is an interface tile.
- Return type:
bool
- npu.runtime.sequence.isMT(coord)#
Accepts coords, returns true if location is a memory tile.
- Return type:
bool
- npu.runtime.sequence.parse_word(s)#
Parsed either an int string or hex string or raises an error.
- Return type:
int
Module contents#
Runtime#
The NPU submodule npu.runtime contains classes and functions to run custom applications on the NPU device. Here APIs are provided to load custom xclbins, allocate numpy arrays to NPU-compatible buffers and read processed data out.
Example usage of the AppRunner class and allocate methods to process your data.
import numpy as np from npu.runtime import AppRunner
# Register the xclbin and program the NPU app = AppRunner(“myapp.xclbin”)
# Generate random python data and allocate input # and output buffers test_data = np.random.randint(0, 255, 256, dtype=np.uint8) bo_in = app.allocate(shape=(256,), dtype=np.uint8) bo_out = app.allocate(shape=(256,), dtype=np.uint8)
# Copy input data into NPU memory bo_in[:] = test_data bo_in.sync_to_npu()
# Execute the application app.call(bo_in, bo_out)
# Update the output buffer with the results bo_out.sync_from_npu()
# Print the output print(np.array(bo_out))
# Unload the application, free resources del app