model.mask_rcnn

Description

🎭 Instance segmentation model for generating high-quality masks.

class Node(config=None, **kwargs)[source]

Initializes and uses Mask R-CNN to infer from an image frame.

The Mask-RCNN node is capable detecting objects and their respective masks from 80 categories. The table of object categories can be found here. The "r50-fpn" backbone is used by default, and the "r101-fpn" for the ResNet 101 backbone variant can also be chosen.

Inputs

img (numpy.ndarray): A NumPy array of shape \((height, width, channels)\) containing the image data in BGR format.

Outputs

bboxes (numpy.ndarray): A NumPy array of shape \((N, 4)\) containing normalized bounding box coordinates of \(N\) detected objects. Each bounding box is represented as \((x_1, y_1, x_2, y_2)\) where \((x_1, y_1)\) is the top-left corner and \((x_2, y_2)\) is the bottom-right corner. The order corresponds to bbox_labels and bbox_scores.

bbox_labels (numpy.ndarray): A NumPy array of shape \((N)\) containing strings representing the labels of detected objects. The order corresponds to bboxes and bbox_scores.

bbox_scores (numpy.ndarray): A NumPy array of shape \((N)\) containing confidence scores \([0, 1]\) of detected objects. The order corresponds to bboxes and bbox_labels.

masks (numpy.ndarray): A NumPy array of shape \((N, H, W)\) containing \(N\) detected binarized masks where \(H\) and \(W\) are the height and width of the masks. The order corresponds to bbox_labels.

Configs

model_type (str) – {“r50-fpn”, “r101-fpn”}, default = “r50-fpn”.
Defines the type of backbones to be used.
weights_parent_dir (Optional[str]) – default = null.
Change the parent directory where weights will be stored by replacing null with an absolute path to the desired directory.
min_size (int) – default = 800.
Minimum size of the image to be rescaled before feeding it to the backbone.
max_size (int) – default = 1333.
Maximum size of the image to be rescaled before feeding it to the backbone.
detect (List[Union[int, string]]) – default = [0].
List of object class names or IDs to be detected. To detect all classes, refer to the tech note.
max_num_detections – (int): default = 100.
Maximum number of detections per image, for all classes.
iou_threshold (float) – [0, 1], default = 0.5.
Overlapping bounding boxes with Intersection over Union (IoU) above the threshold will be discarded.
score_threshold (float) – [0, 1], default = 0.5.
Bounding boxes with classification score below the threshold will be discarded.
mask_threshold (float) – [0, 1], default = 0.5.
The confidence threshold for binarizing the masks’ pixel values; determines whether an object is detected at a particular pixel.

References

Mask R-CNN: A conceptually simple, flexible, and general framework for object instance segmentation.: https://arxiv.org/abs/1703.06870

Inference code adapted from: https://pytorch.org/vision/0.11/_modules/torchvision/models/detection/mask_rcnn.html

The weights for Mask-RCNN Model with ResNet50 FPN backbone were adapted from: https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth