Object Detection Models

List of Object Detection Models

The table below shows the object detection models available for each task category.

Category	Model	Documentation
General	EfficientDet	`model.efficientdet`
	YOLOv4	`model.yolo`
	YOLOX	`model.yolox`
Face	MTCNN	`model.mtcnn`
Face	YOLOv4 (Face)	`model.yolo_face`
License plate	YOLOv4 (License Plate)	`model.yolo_license_plate`

Benchmarks

Inference Speed

The table below shows the frames per second (FPS) of each model type.

Model	Type	Size	CPU		GPU
Model	Type	Size	single	multiple	single	multiple
YOLO	v4tiny	416	22.42	21.71	65.24	57.50
YOLO	v4	416	2.62	2.59	30.40	28.71
EfficientDet	0	512	5.24	5.25	29.51	29.39
	1	640	2.53	2.49	23.79	24.44
	2	768	1.54	1.50	19.86	20.51
	3	896	0.78	0.75	14.69	14.84
	4	1024	0.43	0.42	11.74	11.88
MTCNN	–	–	32.42	18.53	56.35	51.45
YOLOX	yolox-tiny	416	19.43	19.29	55.36	55.38
	yolox-s	640	15.10	15.44	53.81	53.74
	yolox-m	640	8.29	8.04	42.79	43.83
	yolox-l	640	4.59	4.75	35.30	36.08

Hardware

The following hardware were used to conduct the FPS benchmarks:: - CPU: 2.8 GHz 4-Core Intel Xeon (2020, Cascade Lake) CPU and 16GB RAM

- GPU: NVIDIA A100, paired with 2.2 GHz 6-Core Intel Xeon CPU and 85GB RAM

Test Conditions

The following test conditions were followed:: - input.visual, the model of interest, and dabble.fps nodes were used to perform inference on videos

- 2 videos were used to benchmark each model, one with only 1 human (single), and the other with multiple humans (multiple)

- Both videos are about 1 minute each, recorded at ~30 FPS, which translates to about 1,800 frames to process per video

- 1280×720 (HD ready) resolution was used, as a bridge between 640×480 (VGA) of poorer quality webcams, and 1920×1080 (Full HD) of CCTVs

Model Accuracy

The table below shows the performance of our object detection models using the detection evaluation metrics from COCO. Description of these metrics can be found here.

Model	Type	Size	AP	AP ^IoU=.50	AP ^IoU=.75	AP ^small	AP ^medium	AP ^large	AR ^max=1	AR ^max=10	AR ^max=100	AR ^small	AR ^medium	AR ^large
YOLO	v4tiny	416	17.4	32.7	16.6	6.4	20.1	25.6	16.7	22.8	21.1	6.1	23.7	32.1
YOLO	v4	416	43.7	64.0	48.1	23.1	49.6	60.9	33.3	49.1	50.0	26.2	56.1	70.3
EfficientDet	0	512	29.7	44.3	32.4	7.4	34.4	49.2	25.3	34.5	34.8	7.8	39.7	58.4
	1	640	35.2	50.8	38.8	14.3	40.1	53.9	28.8	40.5	40.9	15.6	46.3	62.8
	2	768	38.5	54.4	42.1	18.9	42.7	57.1	30.9	43.9	44.4	20.8	48.9	65.5
	3	896	41.1	57.0	45.2	22.2	45.1	58.7	32.6	46.7	47.3	24.8	51.5	66.9
	4	1024	43.4	59.2	47.8	24.2	47.6	60.4	33.8	49.1	49.7	27.3	53.9	68.7
YOLOX	yolox-tiny	416	32.4	50.5	33.9	13.4	35.4	49.5	28.2	43.5	45.7	20.7	51.7	65.9
	yolox-s	416	35.6	53.4	37.8	14.0	39.3	55.7	30.3	46.0	48.1	20.9	54.7	70.8
	yolox-m	416	41.6	59.7	44.4	18.8	46.9	62.8	33.9	51.6	53.7	26.9	60.9	76.8
	yolox-l	416	44.5	62.5	47.6	21.9	50.6	65.5	35.5	54.2	56.3	31.0	64.0	78.1

Dataset

The MS COCO (val 2017) dataset is used. We integrated the COCO API into the PeekingDuck pipeline for loading the annotations and evaluating the outputs from the models. All values are reported in percentages.

All images from the 80 object categories in the MS COCO (val 2017) dataset were processed.

Test Conditions

The following test conditions were followed:: - The tests were performed using pycocotools on the MS COCO dataset

- The evaluation metrics have been compared with the original repository of the respective object detection models for consistency

Object Detection IDs

General Object Detection

The tables below provide the associated indices for each class in object detectors.

To detect all classes, specify detect: ["*"] under the object detection node configuration in pipeline_config.yml.

Class name	ID		Class name	ID
Class name	YOLO / YOLOX	EfficientDet	Class name	YOLO / YOLOX	EfficientDet
person	0	0	elephant	20	21
bicycle	1	1	bear	21	22
car	2	2	zebra	22	23
motorcycle	3	3	giraffe	23	24
aeroplane	4	4	backpack	24	26
bus	5	5	umbrella	25	27
train	6	6	handbag	26	30
truck	7	7	tie	27	31
boat	8	8	suitcase	28	32
traffic light	9	9	frisbee	29	33
fire hydrant	10	10	skis	30	34
stop sign	11	12	snowboard	31	35
parking meter	12	13	sports ball	32	36
bench	13	14	kite	33	37
bird	14	15	baseball bat	34	38
cat	15	16	baseball glove	35	39
dog	16	17	skateboard	36	40
horse	17	18	surfboard	37	41
sheep	18	19	tennis racket	38	42
cow	19	20	bottle	39	43

Class name	ID		Class name	ID
Class name	YOLO / YOLOX	EfficientDet	Class name	YOLO / YOLOX	EfficientDet
wine glass	40	45	dining table	60	66
cup	41	46	toilet	61	69
fork	42	47	tv	62	71
knife	43	48	laptop	63	72
spoon	44	49	mouse	64	73
bowl	45	50	remote	65	74
banana	46	51	keyboard	66	75
apple	47	52	cell phone	67	76
sandwich	48	53	microwave	68	77
orange	49	54	oven	69	78
broccoli	50	55	toaster	70	79
carrot	51	56	sink	71	80
hot dog	52	57	refrigerator	72	81
pizza	53	58	book	73	83
donut	54	59	clock	74	84
cake	55	60	vase	75	85
chair	56	61	scissors	76	86
couch	57	62	teddy bear	77	87
potted plant	58	63	hair drier	78	88
bed	59	64	toothbrush	79	89

Face Detection

This table provides the associated indices for the model.yolo_face node.

Class name	ID
no mask	0
mask	1