@mlinput attribute

🔌dp.kinect3 dp.oak

signature

mlinput Source [Action]...

examples

@mlinput color rgb
@mlinput color grey
@mlinput color bgr resize 256 160
@mlinput color rgb centercrop resize 256 160 int32 nchw

Machine learning input and preprocessing actions. For example, color in RGB format is easy with @mlinput color rgb.

Input sources

color is the only input source supported at this time

Preprocessing actions

Preprocessing is achieved by a series of actions. The actions are applied to the input in the order listed. Actions are described by a series of tokens and values. Some actions are chords; meaning that several tokens and values are combined.

Simple example @mlinput color rgb

get color input from sensor
convert to rgb encoding

Advanced example @mlinput color rgb centercrop resize 256 160 int32 nchw

get color input from sensor
convert to rgb encoding
center crop to match the resize aspect ratio ↙️
resize the center cropped area to 256x160 pixels (1.6 aspect ratio)
convert each component (red, green, blue) to a 32-bit signed integer
convert to an NCHW layout tensor

Actions are applied in the order listed. The advanced example order could be changed to achieve a similar result with @mlinput color rgb int32 centercrop resize 256 160 nchw

Transcode

Input encoding, fundamental type, normalization, and layout can all be changed.

Encode

Choose the encoding you need. More can be added on request.

rgb, rgba, argb, bgr, bgra: variations of channel order for BT.601 full range [0..255]
uyvy, ycbcr: interleaved YCbCr; recommend another encoding due to Max color bugs
grey: single-channel greyscale using BT.601 primaries

Fundamental type

Fundamental type of input values is changed with the following

Optional norm: uint8 or char, float16, float32, float64
No norm: int8, int16, uint16, int32

Normalize

List norm after the type to enable normalization

uint8 norm, char norm: floating-point 0.0-1.0 to 8-bit unsigned integer 0-255 with (value * 255.0)
float16 norm, float32 norm, float64 norm: 8-bit unsigned integer 0-255 to floating-point 0.0-1.0 with (value / 255.0)

Layout

Inference models and their underlying engines require specific layouts for each batch of input. Two layouts are supported

nchw: number of batch samples, channels, height, width; required for ONNX and often with PyTorch
nhwc: number of batch samples, height, width, channels; often with TensorFlow

Scalar Math

Scalar math operations can be applied to values. Any combination of these can be used for common operations like substracting the mean, scaling, offset, etc.

add: add a value to each respective channel, e.g.
(100, 100, 100) add 2 = (102, 100, 100)
(100, 100, 100) add 2 2 = (102, 102, 100)
(100, 100, 100) add 2 2 2 = (102, 102, 102)
sub: subtract a value from each respective channel, e.g.
(100, 100, 100) sub 2 = (98, 100, 100)
(100, 100, 100) sub 2 2 = (98, 98, 100)
(100, 100, 100) sub 2 2 2 = (98, 98, 98)
mul: multiply a value with each respective channel, e.g.
(100, 100, 100) mul 1.5 = (150, 100, 100)
(100, 100, 100) mul 1.5 1.5 = (150, 150, 100)
(100, 100, 100) mul 1.5 1.5 1.5 = (150, 150, 150)
div: divide a value into each respective channel, e.g. (100, 100, 100) div 2 = (50, 100, 100)
(100, 100, 100) div 2 2 = (50, 50, 100)
(100, 100, 100) div 2 2 2 = (50, 50, 50)

Transform

Resize

resize DIMENSION will resize input to a given width and height dimension. It may be faster to resize early so that later actions operate on smaller input. The dimension can be two separate numbers or a resolution string like 256x160. All three examples below achieve the same result

resize 250 160
resize 250.0 160.0
resize 250x160

Flip

flipx (flip input by columns, rows are same) or flipy (flip input by rows, columns are same) will flip input across rows or columns.

Original flipx flipy

column 1	column 2
1.0	2.0
3.0	4.0

column 1	column 2
2.0	1.0
4.0	3.0

column 1	column 2
3.0	4.0
1.0	2.0

Rotate

rotate NUMBER will rotate input clockwise in 90 degree-increments, i.e. 0, 90, 180, 270. It does not crop or pad input. If your input is 320x240 then rotate 90 will create input 240x320.

Original rotate 90 rotate 180 rotate 270


1.0	2.0
3.0	4.0


3.0	1.0
4.0	2.0


4.0	3.0
2.0	1.0


2.0	4.0
1.0	3.0

Crop

Region of interest

Each crop type supports a region of interest (roi) declared using one of these coordinates

xyxy XLEFT_INT YTOP_INT XRIGHT_INT YBOTTOM_INT: top-left inclusive and bottom-right exclusive corners, e.g. xyxy 200 50 400 150 rectangle has pixels x 200-399 and y 50-149
xywh XLEFT_INT YTOP_INT WIDTH_INT HEIGHT_INT: top-left inclusive and its width and height from that point, e.g. xywh 200 50 200 100 rectangle has pixels x 200-399 and y 50-149
cxywh XCENTER_INT YCENTER_INT WIDTH_INT HEIGHT_INT: center of roi and its total width and height, e.g. cxywh 300 100 200 100 rectangle has pixels x 200-399 and y 50-149

Crop shapes

Cropping shapes require a resize dimension, roi, or both.

crop ROI: crop/remove input outside required roi, preserves visual aspect ratio; e.g. crop xyxy 200 50 400 150
centercrop [ROI] resize DIMENSION: crop/remove input outside optional roi, then center crop with dimension’s aspect ratio and no padding (looses visual content), then resize to dimension, preserves visual aspect ratio; e.g. centercrop resize 256 160 or centercrop xyxy 200 50 400 150 resize 256 160
padcrop [ROI] resize DIMENSION: crop/remove input outside optional roi, then center crop with dimension’s aspect ratio and padding (retains visual content), then resize to dimension, preserves visual aspect ratio, also known as “letterboxing” or “pillarboxing”; e.g. padcrop resize 256 160 or padcrop xyxy 200 50 400 150 resize 256 160