Pose Estimation Models

List of Pose Estimation Models

The table below shows the pose estimation models available for each task category.

Category

Model

Documentation

Whole body

HRNet

model.hrnet

PoseNet

model.posenet

MoveNet

model.movenet

Benchmarks

Inference Speed

The table below shows the frames per second (FPS) of each model type.

Model

Type

Size

CPU

GPU

single

multiple

single

multiple

PoseNet

50

225

64.46

51.95

136.31

89.37

75

225

57.62

47.01

132.84

83.73

100

225

44.70

37.60

132.73

81.24

resnet

225

18.77

17.21

73.15

51.65

HRNet (YOLO)

(v4tiny)

256 × 192 (416)

5.86

1.09

21.91

13.86

MoveNet

SinglePose Lightning

192

40.78

40.54

99.47

SinglePose Thunder

256

25.13

24.87

92.05

MultiPose Lightning

256 or multiple of 32

25.33

24.90

80.64

79.32

Hardware

The following hardware were used to conduct the FPS benchmarks:
- CPU: 2.8 GHz 4-Core Intel Xeon (2020, Cascade Lake) CPU and 16GB RAM
- GPU: NVIDIA A100, paired with 2.2 GHz 6-Core Intel Xeon CPU and 85GB RAM

Test Conditions

The following test conditions were followed:
- input.visual, the model of interest, and dabble.fps nodes were used to perform inference on videos
- 2 videos were used to benchmark each model, one with only 1 human (single), and the other with multiple humans (multiple)
- Both videos are about 1 minute each, recorded at ~30 FPS, which translates to about 1,800 frames to process per video
- 1280×720 (HD ready) resolution was used, as a bridge between 640×480 (VGA) of poorer quality webcams, and 1920×1080 (Full HD) of CCTVs

Model Accuracy

The table below shows the performance of our pose estimation models using the keypoint evaluation metrics from COCO. Description of these metrics can be found here.

Model

Type

Size

AP

AP OKS=.50

AP OKS=.75

AP medium

AP large

AR

AR OKS=.50

AR OKS=.75

AR medium

AR large

PoseNet

50

225

5.2

15.5

2.7

0.8

11.8

9.6

22.7

7.1

1.4

20.7

75

225

7.2

19.7

3.6

1.3

15.9

12.1

26.5

9.3

2.2

25.5

100

225

7.7

20.8

4.4

1.5

17.1

12.6

27.7

10.1

2.3

26.5

resnet

225

11.9

27.4

8.3

2.2

25.3

17.3

32.5

15.9

2.9

36.8

HRNet (YOLO)

(v4tiny)

256 × 192 (416)

35.8

61.5

37.5

30.1

44.0

40.2

64.4

42.7

33.0

50.2

MoveNet

singlepose_lightning

256 x 256

7.3

15.7

5.7

1.3

15.4

8.8

17.6

7.7

1.1

19.2

singlepose_thunder

256 x 256

11.6

21.3

10.7

3.0

23.1

13.1

22.5

12.8

2.8

27.1

multipose_lightning

256 x 256

18.7

36.8

16.3

9.0

31.8

21.0

38.5

19.2

9.3

37.0

Dataset

The MS COCO (val 2017) dataset is used. We integrated the COCO API into the PeekingDuck pipeline for loading the annotations and evaluating the outputs from the models. All values are reported in percentages.

All images from the “person” category in the MS COCO (val 2017) dataset were processed.

Test Conditions

The following test conditions were followed:
- The tests were performed using pycocotools on the MS COCO dataset
- The evaluation metrics have been compared with the original repository of the respective pose estimation models for consistency

Keypoint IDs

Whole Body

Keypoint

ID

Keypoint

ID

nose

0

left wrist

9

left eye

1

right wrist

10

right eye

2

left hip

11

left ear

3

right hip

12

right ear

4

left knee

13

left shoulder

5

right knee

14

right shoulder

6

left ankle

15

left elbow

7

right ankle

16

right elbow

8