Object Detection Models

List of Object Detection Models

The table below shows the object detection models available for each task category.

Category

Model

Documentation

General

EfficientDet

model.efficientdet

YOLOv4

model.yolo

YOLOX

model.yolox

Face

MTCNN

model.mtcnn

YOLOv4 (Face)

model.yolo_face

License plate

YOLOv4 (License Plate)

model.yolo_license_plate

Benchmarks

Inference Speed

The table below shows the frames per second (FPS) of each model type.

Model

Type

Size

CPU

GPU

single

multiple

single

multiple

YOLO

v4tiny

416

22.42

21.71

65.24

57.50

v4

416

2.62

2.59

30.40

28.71

EfficientDet

0

512

5.24

5.25

29.51

29.39

1

640

2.53

2.49

23.79

24.44

2

768

1.54

1.50

19.86

20.51

3

896

0.78

0.75

14.69

14.84

4

1024

0.43

0.42

11.74

11.88

MTCNN

32.42

18.53

56.35

51.45

YOLOX

yolox-tiny

416

19.43

19.29

55.36

55.38

yolox-s

640

15.10

15.44

53.81

53.74

yolox-m

640

8.29

8.04

42.79

43.83

yolox-l

640

4.59

4.75

35.30

36.08

Hardware

The following hardware were used to conduct the FPS benchmarks:
- CPU: 2.8 GHz 4-Core Intel Xeon (2020, Cascade Lake) CPU and 16GB RAM
- GPU: NVIDIA A100, paired with 2.2 GHz 6-Core Intel Xeon CPU and 85GB RAM

Test Conditions

The following test conditions were followed:
- input.visual, the model of interest, and dabble.fps nodes were used to perform inference on videos
- 2 videos were used to benchmark each model, one with only 1 human (single), and the other with multiple humans (multiple)
- Both videos are about 1 minute each, recorded at ~30 FPS, which translates to about 1,800 frames to process per video
- 1280×720 (HD ready) resolution was used, as a bridge between 640×480 (VGA) of poorer quality webcams, and 1920×1080 (Full HD) of CCTVs

Model Accuracy

The table below shows the performance of our object detection models using the detection evaluation metrics from COCO. Description of these metrics can be found here.

Model

Type

Size

AP

AP IoU=.50

AP IoU=.75

AP small

AP medium

AP large

AR max=1

AR max=10

AR max=100

AR small

AR medium

AR large

YOLO

v4tiny

416

17.4

32.7

16.6

6.4

20.1

25.6

16.7

22.8

21.1

6.1

23.7

32.1

v4

416

43.7

64.0

48.1

23.1

49.6

60.9

33.3

49.1

50.0

26.2

56.1

70.3

EfficientDet

0

512

29.7

44.3

32.4

7.4

34.4

49.2

25.3

34.5

34.8

7.8

39.7

58.4

1

640

35.2

50.8

38.8

14.3

40.1

53.9

28.8

40.5

40.9

15.6

46.3

62.8

2

768

38.5

54.4

42.1

18.9

42.7

57.1

30.9

43.9

44.4

20.8

48.9

65.5

3

896

41.1

57.0

45.2

22.2

45.1

58.7

32.6

46.7

47.3

24.8

51.5

66.9

4

1024

43.4

59.2

47.8

24.2

47.6

60.4

33.8

49.1

49.7

27.3

53.9

68.7

YOLOX

yolox-tiny

416

32.4

50.5

33.9

13.4

35.4

49.5

28.2

43.5

45.7

20.7

51.7

65.9

yolox-s

416

35.6

53.4

37.8

14.0

39.3

55.7

30.3

46.0

48.1

20.9

54.7

70.8

yolox-m

416

41.6

59.7

44.4

18.8

46.9

62.8

33.9

51.6

53.7

26.9

60.9

76.8

yolox-l

416

44.5

62.5

47.6

21.9

50.6

65.5

35.5

54.2

56.3

31.0

64.0

78.1

Dataset

The MS COCO (val 2017) dataset is used. We integrated the COCO API into the PeekingDuck pipeline for loading the annotations and evaluating the outputs from the models. All values are reported in percentages.

All images from the 80 object categories in the MS COCO (val 2017) dataset were processed.

Test Conditions

The following test conditions were followed:
- The tests were performed using pycocotools on the MS COCO dataset
- The evaluation metrics have been compared with the original repository of the respective object detection models for consistency

Object Detection IDs

General Object Detection

The tables below provide the associated indices for each class in object detectors.
To detect all classes, specify detect: ["*"] under the object detection node configuration in pipeline_config.yml.

Class name

ID

Class name

ID

YOLO / YOLOX

EfficientDet

YOLO / YOLOX

EfficientDet

person

0

0

elephant

20

21

bicycle

1

1

bear

21

22

car

2

2

zebra

22

23

motorcycle

3

3

giraffe

23

24

aeroplane

4

4

backpack

24

26

bus

5

5

umbrella

25

27

train

6

6

handbag

26

30

truck

7

7

tie

27

31

boat

8

8

suitcase

28

32

traffic light

9

9

frisbee

29

33

fire hydrant

10

10

skis

30

34

stop sign

11

12

snowboard

31

35

parking meter

12

13

sports ball

32

36

bench

13

14

kite

33

37

bird

14

15

baseball bat

34

38

cat

15

16

baseball glove

35

39

dog

16

17

skateboard

36

40

horse

17

18

surfboard

37

41

sheep

18

19

tennis racket

38

42

cow

19

20

bottle

39

43

Class name

ID

Class name

ID

YOLO / YOLOX

EfficientDet

YOLO / YOLOX

EfficientDet

wine glass

40

45

dining table

60

66

cup

41

46

toilet

61

69

fork

42

47

tv

62

71

knife

43

48

laptop

63

72

spoon

44

49

mouse

64

73

bowl

45

50

remote

65

74

banana

46

51

keyboard

66

75

apple

47

52

cell phone

67

76

sandwich

48

53

microwave

68

77

orange

49

54

oven

69

78

broccoli

50

55

toaster

70

79

carrot

51

56

sink

71

80

hot dog

52

57

refrigerator

72

81

pizza

53

58

book

73

83

donut

54

59

clock

74

84

cake

55

60

vase

75

85

chair

56

61

scissors

76

86

couch

57

62

teddy bear

77

87

potted plant

58

63

hair drier

78

88

bed

59

64

toothbrush

79

89

Face Detection

This table provides the associated indices for the model.yolo_face node.

Class name

ID

no mask

0

mask

1