@skelcompute attribute

🔌dp.kinect3 dp.oak

signature

skelcompute ENGINE_STRING [ DEVICE_SUBSTRING | DEVICE_INDEX ]

values

directml 0 default

examples

@skelcompute directml        <- DirectML, first GPU
@skelcompute directml 1      <- DirectML, second GPU
@skelcompute directml "2070" <- DirectML, "2070" GPU
@skelcompute directml intel  <- DirectML, "intel" GPU
@skelcompute directml rtx    <- DirectML, "rtx" GPU
@skelcompute cpu             <- CPU only

Skeleton tracking compute engine and device. Enable multiple compute devices for different tasks using attributes like @skelcompute, @transcoder, and @opencl. For example…

Track skeletons on the discrete Nvidia GPU @skelcompute directml nvidia
Decode color frames on the Intel CPU harware decoder @transcoder intelmedia
Flip and undistort frames on integrated Intel GPU @opencl intel
and the remaining features run on your CPU

Test to discover which settings meet your needs for hardware, latency, and throughput. You can have significant performance improvements! 🙂

📝 The second parameter of @skelcompute is the name of the device or the numeric index (starting with 0) of the device. Name of the device is recommended. The index may change because the Windows Graphics Performance Preference default is “Let Windows decide”. Windows may decide to change the order of GPUs and therefore the indices change. The index is consistent when you choose a Graphics Performance Preference. This setting is in Windows settings, System, Display, Graphics settings.

Known Issues

Microsoft’s Azure Kinect bug https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/1783 sometimes crashes Max when closing the application. It involves some combinations of compute choices. The crash may not be apparent; it happens as Max is closing. It may be no real-world harm.
Azure Kinect bug may freeze or crash with the below scenario. It may be safer to change @skelcompute by typing that attribute on the (dp.kinect3 @skelcompute directml ...) box itself.
1. Output visual data like depth, color, ir, or playermap
2. Stop dp.kinect3
3. Change @skelcompute using a message to dp.kinect3 or an (attrui)
4. Start dp.kinect3, the crash/freeze usually happens within a few seconds.

Removed CUDA and TensorRT body tracking

In 2025, Nvidia CUDA and TensorRT compute modes were replaced with DirectML. DirectML is now distributed by Windows Update. These optional Nvidia compute modes never performed better than DirectML.

Setup

⚠️ Do not use these instructions with current versions of our plugins. They are for use with very old versions of our plugins.

Follow setup instructions for the plugin’s optional body tracking
Download optional-nvidia-addons.zip

Copy these files within the ZIP download to the plugin folder

onnxruntime_providers_shared.dll
onnxruntime_providers_cuda.dll
zlibwapi.dll

Download Nvidia CUDA v11.4.4 for Windows x64. Typical filename cuda_11.4.4_472.50_windows

Copy these CUDA files from bin folder to the plugin folder

cudart64_110.dll
cufft64_10.dll
cublas64_11.dll
cublasLt64_11.dll

Download Nvidia cuDnn v8.4.1.50 for CUDA 11.4-11.6 for Windows x64. Typical filename cudnn-windows-x86_64-8.4.1.50_cuda11.6

Copy these cuDNN files from bin to the plugin folder

cudnn64_8.dll
cudnn_cnn_infer64_8.dll
cudnn_ops_infer64_8.dll

For TensorRT, follow these additional steps

Copy these additional files within the optional-nvidia-addons.zip download to the plugin folder
```
onnxruntime_providers_tensorrt.dll
```
Copy these additional CUDA files from bin folder to the plugin folder
```
nvrtc64_112_0.dll
nvrtc-builtins64_114.dll
```
Download Nvidia TensorRT v8.4.1.5 for CUDA 11.4-11.6 and cuDNN 8.4 for Windows x64. Typical filename TensorRT-8.4.1.5.Windows10.x86_64.cuda-11.6.cudnn8.4

Copy these Tensor Runtime files from lib to the plugin folder

nvinfer.dll
nvinfer_plugin.dll
nvinfer_builder_resource.dll
nvonnxparser.dll

Usage

⚠️ Do not use these instructions with current versions of our plugins. They are for use with very old versions of our plugins.

@skelcompute cuda            <- CUDA, first Nvidia GPU
@skelcompute tensor          <- TensorRT, first Nvidia GPU
@skelcompute tensor_fp16     <- TensorRT, Nvidia GPU, float16

TensorRT usually performs better than CUDA.
@skelcompute tensor_fp16 will use the Tensor Runtime and optimize the chosen body tracking model with 16-bit floating point calculations. It uses less resources and is slightly less accurate.
TensorRT choices will have a very long first-time startup. For example, it is almost 5 minutes on a laptop with a RTX2070-Super GPU and choosing tensor_fp16. Later startups are only a few seconds due to caching.
TensorRT caches the first-time startup optimizations at %TEMP%\PLUGIN_NAME\tensor.cache. You or Windows can delete these cache folders. TensorRT will re-create a cache on the next startup.
Each TensorRT cache folder is dependent on your DLLs, body tracking model, compute device, fp16 optimization, etc. When you change configuration, a new cache folder will be created and new optimizations cached. These cache folders can grow to be hundreds of megabytes.