model.mask_rcnn
Description
🎭 Instance segmentation model for generating high-quality masks.
- class Node(config=None, **kwargs)[source]
Initializes and uses Mask R-CNN to infer from an image frame.
The Mask-RCNN node is capable detecting objects and their respective masks from 80 categories. The table of object categories can be found here. The
"r50-fpn"
backbone is used by default, and the"r101-fpn"
for the ResNet 101 backbone variant can also be chosen.- Inputs
img
(numpy.ndarray
): A NumPy array of shape \((height, width, channels)\) containing the image data in BGR format.- Outputs
bboxes
(numpy.ndarray
): A NumPy array of shape \((N, 4)\) containing normalized bounding box coordinates of \(N\) detected objects. Each bounding box is represented as \((x_1, y_1, x_2, y_2)\) where \((x_1, y_1)\) is the top-left corner and \((x_2, y_2)\) is the bottom-right corner. The order corresponds to bbox_labels and bbox_scores.bbox_labels
(numpy.ndarray
): A NumPy array of shape \((N)\) containing strings representing the labels of detected objects. The order corresponds to bboxes and bbox_scores.bbox_scores
(numpy.ndarray
): A NumPy array of shape \((N)\) containing confidence scores \([0, 1]\) of detected objects. The order corresponds to bboxes and bbox_labels.masks
(numpy.ndarray
): A NumPy array of shape \((N, H, W)\) containing \(N\) detected binarized masks where \(H\) and \(W\) are the height and width of the masks. The order corresponds to bbox_labels.- Configs
model_type (
str
) – {“r50-fpn”, “r101-fpn”}, default = “r50-fpn”.
Defines the type of backbones to be used.weights_parent_dir (
Optional[str]
) – default = null.
Change the parent directory where weights will be stored by replacingnull
with an absolute path to the desired directory.min_size (
int
) – default = 800.
Minimum size of the image to be rescaled before feeding it to the backbone.max_size (
int
) – default = 1333.
Maximum size of the image to be rescaled before feeding it to the backbone.detect (
List[Union[int, string]]
) – default = [0].
List of object class names or IDs to be detected. To detect all classes, refer to the tech note.max_num_detections – (
int
): default = 100.
Maximum number of detections per image, for all classes.iou_threshold (
float
) – [0, 1], default = 0.5.
Overlapping bounding boxes with Intersection over Union (IoU) above the threshold will be discarded.score_threshold (
float
) – [0, 1], default = 0.5.
Bounding boxes with classification score below the threshold will be discarded.mask_threshold (
float
) – [0, 1], default = 0.5.
The confidence threshold for binarizing the masks’ pixel values; determines whether an object is detected at a particular pixel.
References
Mask R-CNN: A conceptually simple, flexible, and general framework for object instance segmentation.: https://arxiv.org/abs/1703.06870
Inference code adapted from: https://pytorch.org/vision/0.11/_modules/torchvision/models/detection/mask_rcnn.html
The weights for Mask-RCNN Model with ResNet50 FPN backbone were adapted from: https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth