Instance Segmentation Models

List of Instance Segmentation Models

The table below shows the instance segmentation models available.

Model

Documentation

Mask R-CNN

model.mask_rcnn

YolactEdge

model.yolact_edge

Benchmarks

Inference Speed

The table below shows the frames per second (FPS) of each model type.

Model

Type

Size

CPU

GPU

single

multiple

single

multiple

Mask R-CNN

r50-fpn

800-1333

0.76

0.72

22.30

18.58

r101-fpn

800-1333

0.61

0.57

17.14

14.83

YolactEdge

r50-fpn

550

2.99

2.93

40.84

33.94

r101-fpn

550

2.32

2.27

29.55

25.89

mobilenetv2

550

4.93

4.64

48.59

36.66

Hardware

The following hardware were used to conduct the FPS benchmarks:
- CPU: 2.8 GHz 4-Core Intel Xeon (2020, Cascade Lake) CPU and 16GB RAM
- GPU: NVIDIA A100, paired with 2.2 GHz 6-Core Intel Xeon CPU and 85GB RAM

Test Conditions

The following test conditions were followed:
- input.visual, the model of interest, and dabble.fps nodes were used to perform inference on videos
- 2 videos were used to benchmark each model, one with only 1 human (single), and the other with multiple humans (multiple)
- Both videos are about 1 minute each, recorded at ~30 FPS, which translates to about 1,800 frames to process per video
- 1280×720 (HD ready) resolution was used, as a bridge between 640×480 (VGA) of poorer quality webcams, and 1920×1080 (Full HD) of CCTVs

Model Accuracy

The table below shows the performance of our Instance Segmentation models using the detection evaluation metrics from COCO. Description of these metrics can be found here.

Evaluation on masks

Model

Type

Size

AP

AP IoU=.50

AP IoU=.75

AP small

AP medium

AP large

AR max=1

AR max=10

AR max=100

AR small

AR medium

AR large

Mask R-CNN

r50-fpn

800-1333

34.5

56.0

36.7

17.8

37.9

47.1

29.7

45.6

47.6

27.4

51.4

63.8

r101-fpn

800-1333

37.1

59.0

39.6

20.4

41.1

49.8

31.4

49.1

51.4

31.9

55.6

67.3

YolactEdge

r50-fpn

550

27.8

45.6

28.9

10.4

30.0

43.9

26.3

37.5

38.2

16.3

41.9

57.2

r101-fpn

550

29.6

47.8

31.1

11.3

32.3

46.3

27.4

38.9

39.7

17.4

43.6

59.6

mobilenetv2

550

21.9

37.2

22.6

7.0

22.9

34.7

22.5

31.7

32.3

12.0

34.8

48.3

Evaluation on bounding boxes

Model

Type

Size

AP

AP IoU=.50

AP IoU=.75

AP small

AP medium

AP large

AR max=1

AR max=10

AR max=100

AR small

AR medium

AR large

Mask R-CNN

r50-fpn

800-1333

37.8

59.2

41.1

21.6

41.2

49.3

31.4

49.5

51.9

32.6

55.7

66.6

r101-fpn

800-1333

41.8

62.2

45.4

24.9

45.8

54.3

34.4

54.6

57.3

38.2

61.4

72.4

YolactEdge

r50-fpn

550

30.3

49.8

32.2

14.4

32.1

44.6

27.4

40.1

41.2

21.6

43.7

57.5

r101-fpn

550

32.6

52.5

34.9

15.2

35.0

47.6

28.6

41.8

42.9

22.6

45.9

59.9

mobilenetv2

550

23.2

40.8

23.8

9.3

23.4

35.1

22.9

33.5

34.5

15.8

35.2

49.1

“””

Dataset

The MS COCO (val 2017) dataset is used. We integrated the COCO API into the PeekingDuck pipeline for loading the annotations and evaluating the outputs from the models. All values are reported in percentages.

All images from the 80 object categories in the MS COCO (val 2017) dataset were processed.

Test Conditions

The following test conditions were followed:
- The tests were performed using pycocotools on the MS COCO dataset
- The evaluation metrics have been compared with the original repository of the respective instance segmentation models for consistency

Instance Segmentation IDs

General Instance Segmentation

The tables below provide the associated indices for each class.
To detect all classes, specify detect: ["*"] under the instance segmentation node configuration in pipeline_config.yml.

Class name

ID

Class name

ID

Mask R-CNN

YolactEdge

Mask R-CNN

YolactEdge

person

0

0

elephant

21

20

bicycle

1

1

bear

22

21

car

2

2

zebra

23

22

motorcycle

3

3

giraffe

24

23

aeroplane

4

4

backpack

26

24

bus

5

5

umbrella

27

25

train

6

6

handbag

30

26

truck

7

7

tie

31

27

boat

8

8

suitcase

32

28

traffic light

9

9

frisbee

33

29

fire hydrant

10

10

skis

34

30

stop sign

12

11

snowboard

35

31

parking meter

13

12

sports ball

36

32

bench

14

13

kite

37

33

bird

15

14

baseball bat

38

34

cat

16

15

baseball glove

39

35

dog

17

16

skateboard

40

36

horse

18

17

surfboard

41

37

sheep

19

18

tennis racket

42

38

cow

20

19

bottle

43

39

Class name

ID

Class name

ID

Mask R-CNN

YolactEdge

Mask R-CNN

YolactEdge

wine glass

45

40

dining table

66

60

cup

46

41

toilet

69

61

fork

47

42

tv

71

62

knife

48

43

laptop

72

63

spoon

49

44

mouse

73

64

bowl

50

45

remote

74

65

banana

51

46

keyboard

75

66

apple

52

47

cell phone

76

67

sandwich

53

48

microwave

77

68

orange

54

49

oven

78

69

broccoli

55

50

toaster

79

70

carrot

56

51

sink

80

71

hot dog

57

52

refrigerator

81

72

pizza

58

53

book

83

73

donut

59

54

clock

84

74

cake

60

55

vase

85

75

chair

61

56

scissors

86

76

couch

62

57

teddy bear

87

77

potted plant

63

58

hair drier

88

78

bed

64

59

toothbrush

89

79