@skelcompute attribute
🔌dp.kinect3
dp.oak
- signature
skelcompute ENGINE_STRING [ DEVICE_SUBSTRING | DEVICE_INDEX ]
- values
- directml 0 default
- examples
@skelcompute directml <- DirectML, first GPU
@skelcompute directml 1 <- DirectML, second GPU
@skelcompute directml "2070" <- DirectML, "2070" GPU
@skelcompute directml intel <- DirectML, "intel" GPU
@skelcompute directml rtx <- DirectML, "rtx" GPU
@skelcompute cpu <- CPU only
Skeleton tracking compute engine and device. Enable multiple compute devices for different tasks using attributes like @skelcompute, @transcoder, and @opencl. For example…
- Track skeletons on the discrete Nvidia GPU
@skelcompute directml nvidia
- Decode color frames on the Intel CPU harware decoder
@transcoder intelmedia
- Flip and undistort frames on integrated Intel GPU
@opencl intel
- and the remaining features run on your CPU
Test to discover which settings meet your needs for hardware, latency, and throughput. You can have significant performance improvements! 🙂
📝 The second parameter of
@skelcompute
is the name of the device or the numeric index (starting with 0) of the device. Name of the device is recommended. The index may change because the Windows Graphics Performance Preference default is “Let Windows decide”. Windows may decide to change the order of GPUs and therefore the indices change. The index is consistent when you choose a Graphics Performance Preference. This setting is in Windows settings, System, Display, Graphics settings.
Known Issues
- Microsoft’s Azure Kinect bug https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/1783 sometimes crashes Max when closing the application. It involves some combinations of compute choices. The crash may not be apparent; it happens as Max is closing. It may be no real-world harm.
- Azure Kinect bug may freeze or crash with the below scenario. It may be safer to change
@skelcompute
by typing that attribute on the(dp.kinect3 @skelcompute directml ...)
box itself.- Output visual data like depth, color, ir, or playermap
- Stop dp.kinect3
- Change
@skelcompute
using a message to dp.kinect3 or an (attrui) - Start dp.kinect3, the crash/freeze usually happens within a few seconds.
Removed CUDA and TensorRT body tracking
In 2025, Nvidia CUDA and TensorRT compute modes were replaced with DirectML. DirectML is now distributed by Windows Update. These optional Nvidia compute modes never performed better than DirectML.
Setup
⚠️ Do not use these instructions with current versions of our plugins. They are for use with very old versions of our plugins.
- Follow setup instructions for the plugin’s optional body tracking
- Download optional-nvidia-addons.zip
- Copy these files within the ZIP download to the plugin folder
onnxruntime_providers_shared.dll onnxruntime_providers_cuda.dll zlibwapi.dll
- Download Nvidia CUDA v11.4.4 for Windows x64. Typical filename
cuda_11.4.4_472.50_windows
- Copy these CUDA files from
bin
folder to the plugin foldercudart64_110.dll cufft64_10.dll cublas64_11.dll cublasLt64_11.dll
- Download Nvidia cuDnn v8.4.1.50 for CUDA 11.4-11.6 for Windows x64. Typical filename
cudnn-windows-x86_64-8.4.1.50_cuda11.6
- Copy these cuDNN files from
bin
to the plugin foldercudnn64_8.dll cudnn_cnn_infer64_8.dll cudnn_ops_infer64_8.dll
For TensorRT, follow these additional steps
- Copy these additional files within the
optional-nvidia-addons.zip
download to the plugin folderonnxruntime_providers_tensorrt.dll
- Copy these additional CUDA files from
bin
folder to the plugin foldernvrtc64_112_0.dll nvrtc-builtins64_114.dll
- Download Nvidia TensorRT v8.4.1.5 for CUDA 11.4-11.6 and cuDNN 8.4 for Windows x64.
Typical filename
TensorRT-8.4.1.5.Windows10.x86_64.cuda-11.6.cudnn8.4
- Copy these Tensor Runtime files from
lib
to the plugin foldernvinfer.dll nvinfer_plugin.dll nvinfer_builder_resource.dll nvonnxparser.dll
Usage
⚠️ Do not use these instructions with current versions of our plugins. They are for use with very old versions of our plugins.
@skelcompute cuda <- CUDA, first Nvidia GPU
@skelcompute tensor <- TensorRT, first Nvidia GPU
@skelcompute tensor_fp16 <- TensorRT, Nvidia GPU, float16
- TensorRT usually performs better than CUDA.
-
@skelcompute tensor_fp16
will use the Tensor Runtime and optimize the chosen body tracking model with 16-bit floating point calculations. It uses less resources and is slightly less accurate. - TensorRT choices will have a very long first-time startup. For example, it is
almost 5 minutes on a laptop with a RTX2070-Super GPU and choosing
tensor_fp16
. Later startups are only a few seconds due to caching. - TensorRT caches the first-time startup optimizations at
%TEMP%\PLUGIN_NAME\tensor.cache
. You or Windows can delete these cache folders. TensorRT will re-create a cache on the next startup. - Each TensorRT cache folder is dependent on your DLLs, body tracking model, compute device, fp16 optimization, etc. When you change configuration, a new cache folder will be created and new optimizations cached. These cache folders can grow to be hundreds of megabytes.