Instance Segmentation Models

List of Instance Segmentation Models

The table below shows the instance segmentation models available.

Model	Documentation
Mask R-CNN	`model.mask_rcnn`
YolactEdge	`model.yolact_edge`

Benchmarks

Inference Speed

The table below shows the frames per second (FPS) of each model type.

Model	Type	Size	CPU		GPU
Model	Type	Size	single	multiple	single	multiple
Mask R-CNN	r50-fpn	800-1333	0.76	0.72	22.30	18.58
Mask R-CNN	r101-fpn	800-1333	0.61	0.57	17.14	14.83
YolactEdge	r50-fpn	550	2.99	2.93	40.84	33.94
	r101-fpn	550	2.32	2.27	29.55	25.89
	mobilenetv2	550	4.93	4.64	48.59	36.66

Hardware

The following hardware were used to conduct the FPS benchmarks:: - CPU: 2.8 GHz 4-Core Intel Xeon (2020, Cascade Lake) CPU and 16GB RAM

- GPU: NVIDIA A100, paired with 2.2 GHz 6-Core Intel Xeon CPU and 85GB RAM

Test Conditions

The following test conditions were followed:: - input.visual, the model of interest, and dabble.fps nodes were used to perform inference on videos

- 2 videos were used to benchmark each model, one with only 1 human (single), and the other with multiple humans (multiple)

- Both videos are about 1 minute each, recorded at ~30 FPS, which translates to about 1,800 frames to process per video

- 1280×720 (HD ready) resolution was used, as a bridge between 640×480 (VGA) of poorer quality webcams, and 1920×1080 (Full HD) of CCTVs

Model Accuracy

The table below shows the performance of our Instance Segmentation models using the detection evaluation metrics from COCO. Description of these metrics can be found here.

Evaluation on masks

Model	Type	Size	AP	AP ^IoU=.50	AP ^IoU=.75	AP ^small	AP ^medium	AP ^large	AR ^max=1	AR ^max=10	AR ^max=100	AR ^small	AR ^medium	AR ^large
Mask R-CNN	r50-fpn	800-1333	34.5	56.0	36.7	17.8	37.9	47.1	29.7	45.6	47.6	27.4	51.4	63.8
Mask R-CNN	r101-fpn	800-1333	37.1	59.0	39.6	20.4	41.1	49.8	31.4	49.1	51.4	31.9	55.6	67.3
YolactEdge	r50-fpn	550	27.8	45.6	28.9	10.4	30.0	43.9	26.3	37.5	38.2	16.3	41.9	57.2
	r101-fpn	550	29.6	47.8	31.1	11.3	32.3	46.3	27.4	38.9	39.7	17.4	43.6	59.6
	mobilenetv2	550	21.9	37.2	22.6	7.0	22.9	34.7	22.5	31.7	32.3	12.0	34.8	48.3

Evaluation on bounding boxes

Model	Type	Size	AP	AP ^IoU=.50	AP ^IoU=.75	AP ^small	AP ^medium	AP ^large	AR ^max=1	AR ^max=10	AR ^max=100	AR ^small	AR ^medium	AR ^large
Mask R-CNN	r50-fpn	800-1333	37.8	59.2	41.1	21.6	41.2	49.3	31.4	49.5	51.9	32.6	55.7	66.6
Mask R-CNN	r101-fpn	800-1333	41.8	62.2	45.4	24.9	45.8	54.3	34.4	54.6	57.3	38.2	61.4	72.4
YolactEdge	r50-fpn	550	30.3	49.8	32.2	14.4	32.1	44.6	27.4	40.1	41.2	21.6	43.7	57.5
	r101-fpn	550	32.6	52.5	34.9	15.2	35.0	47.6	28.6	41.8	42.9	22.6	45.9	59.9
	mobilenetv2	550	23.2	40.8	23.8	9.3	23.4	35.1	22.9	33.5	34.5	15.8	35.2	49.1

“””

Dataset

The MS COCO (val 2017) dataset is used. We integrated the COCO API into the PeekingDuck pipeline for loading the annotations and evaluating the outputs from the models. All values are reported in percentages.

All images from the 80 object categories in the MS COCO (val 2017) dataset were processed.

Test Conditions

The following test conditions were followed:: - The tests were performed using pycocotools on the MS COCO dataset

- The evaluation metrics have been compared with the original repository of the respective instance segmentation models for consistency

Instance Segmentation IDs

General Instance Segmentation

The tables below provide the associated indices for each class.

To detect all classes, specify detect: ["*"] under the instance segmentation node configuration in pipeline_config.yml.

Class name	ID		Class name	ID
Class name	Mask R-CNN	YolactEdge	Class name	Mask R-CNN	YolactEdge
person	0	0	elephant	21	20
bicycle	1	1	bear	22	21
car	2	2	zebra	23	22
motorcycle	3	3	giraffe	24	23
aeroplane	4	4	backpack	26	24
bus	5	5	umbrella	27	25
train	6	6	handbag	30	26
truck	7	7	tie	31	27
boat	8	8	suitcase	32	28
traffic light	9	9	frisbee	33	29
fire hydrant	10	10	skis	34	30
stop sign	12	11	snowboard	35	31
parking meter	13	12	sports ball	36	32
bench	14	13	kite	37	33
bird	15	14	baseball bat	38	34
cat	16	15	baseball glove	39	35
dog	17	16	skateboard	40	36
horse	18	17	surfboard	41	37
sheep	19	18	tennis racket	42	38
cow	20	19	bottle	43	39

Class name	ID		Class name	ID
Class name	Mask R-CNN	YolactEdge	Class name	Mask R-CNN	YolactEdge
wine glass	45	40	dining table	66	60
cup	46	41	toilet	69	61
fork	47	42	tv	71	62
knife	48	43	laptop	72	63
spoon	49	44	mouse	73	64
bowl	50	45	remote	74	65
banana	51	46	keyboard	75	66
apple	52	47	cell phone	76	67
sandwich	53	48	microwave	77	68
orange	54	49	oven	78	69
broccoli	55	50	toaster	79	70
carrot	56	51	sink	80	71
hot dog	57	52	refrigerator	81	72
pizza	58	53	book	83	73
donut	59	54	clock	84	74
cake	60	55	vase	85	75
chair	61	56	scissors	86	76
couch	62	57	teddy bear	87	77
potted plant	63	58	hair drier	88	78
bed	64	59	toothbrush	89	79