Object Detection Models
List of Object Detection Models
The table below shows the object detection models available for each task category.
Category |
Model |
Documentation |
---|---|---|
General |
EfficientDet |
|
YOLOv4 |
||
YOLOX |
||
Face |
MTCNN |
|
YOLOv4 (Face) |
||
License plate |
YOLOv4 (License Plate) |
Benchmarks
Inference Speed
The table below shows the frames per second (FPS) of each model type.
Model |
Type |
Size |
CPU |
GPU |
||
---|---|---|---|---|---|---|
single |
multiple |
single |
multiple |
|||
YOLO |
v4tiny |
416 |
22.42 |
21.71 |
65.24 |
57.50 |
v4 |
416 |
2.62 |
2.59 |
30.40 |
28.71 |
|
EfficientDet |
0 |
512 |
5.24 |
5.25 |
29.51 |
29.39 |
1 |
640 |
2.53 |
2.49 |
23.79 |
24.44 |
|
2 |
768 |
1.54 |
1.50 |
19.86 |
20.51 |
|
3 |
896 |
0.78 |
0.75 |
14.69 |
14.84 |
|
4 |
1024 |
0.43 |
0.42 |
11.74 |
11.88 |
|
MTCNN |
– |
– |
32.42 |
18.53 |
56.35 |
51.45 |
YOLOX |
yolox-tiny |
416 |
19.43 |
19.29 |
55.36 |
55.38 |
yolox-s |
640 |
15.10 |
15.44 |
53.81 |
53.74 |
|
yolox-m |
640 |
8.29 |
8.04 |
42.79 |
43.83 |
|
yolox-l |
640 |
4.59 |
4.75 |
35.30 |
36.08 |
Hardware
- The following hardware were used to conduct the FPS benchmarks:
- -
CPU
: 2.8 GHz 4-Core Intel Xeon (2020, Cascade Lake) CPU and 16GB RAM-GPU
: NVIDIA A100, paired with 2.2 GHz 6-Core Intel Xeon CPU and 85GB RAM
Test Conditions
- The following test conditions were followed:
- -
input.visual
, the model of interest, anddabble.fps
nodes were used to perform inference on videos- 2 videos were used to benchmark each model, one with only 1 human (single
), and the other with multiple humans (multiple
)- Both videos are about 1 minute each, recorded at ~30 FPS, which translates to about 1,800 frames to process per video- 1280×720 (HD ready) resolution was used, as a bridge between 640×480 (VGA) of poorer quality webcams, and 1920×1080 (Full HD) of CCTVs
Model Accuracy
The table below shows the performance of our object detection models using the detection evaluation metrics from COCO. Description of these metrics can be found here.
Model |
Type |
Size |
AP |
AP IoU=.50 |
AP IoU=.75 |
AP small |
AP medium |
AP large |
AR max=1 |
AR max=10 |
AR max=100 |
AR small |
AR medium |
AR large |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
YOLO |
v4tiny |
416 |
17.4 |
32.7 |
16.6 |
6.4 |
20.1 |
25.6 |
16.7 |
22.8 |
21.1 |
6.1 |
23.7 |
32.1 |
v4 |
416 |
43.7 |
64.0 |
48.1 |
23.1 |
49.6 |
60.9 |
33.3 |
49.1 |
50.0 |
26.2 |
56.1 |
70.3 |
|
EfficientDet |
0 |
512 |
29.7 |
44.3 |
32.4 |
7.4 |
34.4 |
49.2 |
25.3 |
34.5 |
34.8 |
7.8 |
39.7 |
58.4 |
1 |
640 |
35.2 |
50.8 |
38.8 |
14.3 |
40.1 |
53.9 |
28.8 |
40.5 |
40.9 |
15.6 |
46.3 |
62.8 |
|
2 |
768 |
38.5 |
54.4 |
42.1 |
18.9 |
42.7 |
57.1 |
30.9 |
43.9 |
44.4 |
20.8 |
48.9 |
65.5 |
|
3 |
896 |
41.1 |
57.0 |
45.2 |
22.2 |
45.1 |
58.7 |
32.6 |
46.7 |
47.3 |
24.8 |
51.5 |
66.9 |
|
4 |
1024 |
43.4 |
59.2 |
47.8 |
24.2 |
47.6 |
60.4 |
33.8 |
49.1 |
49.7 |
27.3 |
53.9 |
68.7 |
|
YOLOX |
yolox-tiny |
416 |
32.4 |
50.5 |
33.9 |
13.4 |
35.4 |
49.5 |
28.2 |
43.5 |
45.7 |
20.7 |
51.7 |
65.9 |
yolox-s |
416 |
35.6 |
53.4 |
37.8 |
14.0 |
39.3 |
55.7 |
30.3 |
46.0 |
48.1 |
20.9 |
54.7 |
70.8 |
|
yolox-m |
416 |
41.6 |
59.7 |
44.4 |
18.8 |
46.9 |
62.8 |
33.9 |
51.6 |
53.7 |
26.9 |
60.9 |
76.8 |
|
yolox-l |
416 |
44.5 |
62.5 |
47.6 |
21.9 |
50.6 |
65.5 |
35.5 |
54.2 |
56.3 |
31.0 |
64.0 |
78.1 |
Dataset
The MS COCO (val 2017) dataset is used. We integrated the COCO API into the PeekingDuck pipeline for loading the annotations and evaluating the outputs from the models. All values are reported in percentages.
All images from the 80 object categories in the MS COCO (val 2017) dataset were processed.
Test Conditions
- The following test conditions were followed:
- - The tests were performed using pycocotools on the MS COCO dataset- The evaluation metrics have been compared with the original repository of the respective object detection models for consistency
Object Detection IDs
General Object Detection
detect: ["*"]
under the object detection node configuration in pipeline_config.yml
.Class name |
ID |
Class name |
ID |
||
---|---|---|---|---|---|
YOLO / YOLOX |
EfficientDet |
YOLO / YOLOX |
EfficientDet |
||
person |
0 |
0 |
elephant |
20 |
21 |
bicycle |
1 |
1 |
bear |
21 |
22 |
car |
2 |
2 |
zebra |
22 |
23 |
motorcycle |
3 |
3 |
giraffe |
23 |
24 |
aeroplane |
4 |
4 |
backpack |
24 |
26 |
bus |
5 |
5 |
umbrella |
25 |
27 |
train |
6 |
6 |
handbag |
26 |
30 |
truck |
7 |
7 |
tie |
27 |
31 |
boat |
8 |
8 |
suitcase |
28 |
32 |
traffic light |
9 |
9 |
frisbee |
29 |
33 |
fire hydrant |
10 |
10 |
skis |
30 |
34 |
stop sign |
11 |
12 |
snowboard |
31 |
35 |
parking meter |
12 |
13 |
sports ball |
32 |
36 |
bench |
13 |
14 |
kite |
33 |
37 |
bird |
14 |
15 |
baseball bat |
34 |
38 |
cat |
15 |
16 |
baseball glove |
35 |
39 |
dog |
16 |
17 |
skateboard |
36 |
40 |
horse |
17 |
18 |
surfboard |
37 |
41 |
sheep |
18 |
19 |
tennis racket |
38 |
42 |
cow |
19 |
20 |
bottle |
39 |
43 |
Class name |
ID |
Class name |
ID |
||
---|---|---|---|---|---|
YOLO / YOLOX |
EfficientDet |
YOLO / YOLOX |
EfficientDet |
||
wine glass |
40 |
45 |
dining table |
60 |
66 |
cup |
41 |
46 |
toilet |
61 |
69 |
fork |
42 |
47 |
tv |
62 |
71 |
knife |
43 |
48 |
laptop |
63 |
72 |
spoon |
44 |
49 |
mouse |
64 |
73 |
bowl |
45 |
50 |
remote |
65 |
74 |
banana |
46 |
51 |
keyboard |
66 |
75 |
apple |
47 |
52 |
cell phone |
67 |
76 |
sandwich |
48 |
53 |
microwave |
68 |
77 |
orange |
49 |
54 |
oven |
69 |
78 |
broccoli |
50 |
55 |
toaster |
70 |
79 |
carrot |
51 |
56 |
sink |
71 |
80 |
hot dog |
52 |
57 |
refrigerator |
72 |
81 |
pizza |
53 |
58 |
book |
73 |
83 |
donut |
54 |
59 |
clock |
74 |
84 |
cake |
55 |
60 |
vase |
75 |
85 |
chair |
56 |
61 |
scissors |
76 |
86 |
couch |
57 |
62 |
teddy bear |
77 |
87 |
potted plant |
58 |
63 |
hair drier |
78 |
88 |
bed |
59 |
64 |
toothbrush |
79 |
89 |
Face Detection
This table provides the associated indices for the model.yolo_face
node.
Class name |
ID |
---|---|
no mask |
0 |
mask |
1 |