People Counting (Over Time)
Overview
People counting over time involves detecting and tracking different persons, and incrementing the count when a new person appears. This use case can reduce dependency on manual counting, and be applied to areas such as retail analytics, queue management, or occupancy monitoring.
Our solution automatically detects, tracks and counts people over time. This is explained in the How It Works section.
Demo
To try our solution on your own computer, install and run PeekingDuck with the configuration file people_counting_over_time.yml as shown:
Terminal Session
You may like to try it on this sample video.
How It Works
People counting over time comprises three main components:
Human detection,
Appearance embedding tracking, and
Incrementing the count.
1. Human Detection
We use an open source detection model known as JDE to detect persons. JDE has been trained on pedestrian detection and person search datasets. This allows the application to identify the locations of persons in a video feed. Each of these locations is represented as a pair of x, y coordinates in the form \([x_1, y_1, x_2, y_2]\), where \((x_1, y_1)\) is the top left corner of the bounding box, and \((x_2, y_2)\) is the bottom right. These are used to form the bounding box of each person detected. For more information on how to adjust the JDE node, check out the JDE configurable parameters.
2. Appearance Embedding Tracking
To learn appearance embeddings for tracking, a metric learning algorithm with triplet loss is used. Observations are assigned to tracklets using the Hungarian algorithm. The Kalman filter is used to smooth the trajectories and predict the locations of previous tracklets in the current frame. The model outputs an ID for each detection based on the appearance embedding learned.
3. Incrementing the Count
Monotonically increasing integer IDs beginning from 0 are assigned to new unique persons. For example, the first tracked person is assigned an ID of 0, the second tracked person is assigned an ID of 1, and so on. Thus the total number of unique persons that have appeared in the entire duration is simply the cumulative maximum.
Nodes Used
These are the nodes used in the earlier demo (also in people_counting_over_time.yml):
nodes:
- input.visual:
source: <path/to/video with people>
- model.jde
- dabble.statistics:
maximum: obj_attrs["ids"]
- draw.bbox
- draw.tag:
show: ["ids"]
- draw.legend:
show: ["cum_max"]
- output.screen
1. JDE Node
This node employs a single network to simultaneously output detection results and the corresponding appearance embeddings of the detected boxes. Therefore JDE stands for Joint Detection and Embedding. Please take a look at the benchmarks of object tracking models that are included in PeekingDuck if you would like to use a different model or model type better suited to your use case.
2. Statistics Node
The dabble.statistics
node retrieves the maximum detected ID for each frame. If the ID
exceeds the previous maximum, the cum_max (cumulative maximum) is updated. As monotonically
increasing integer IDs beginning from 0 are assigned to new unique persons, the maximum ID is
equal to the total number of unique persons over time.
3. Adjusting Nodes
With regard to the model.jde
node, some common behaviors that you might want to adjust are:
iou_threshold
: Specifies the threshold value for Intersection over Union of detections (default = 0.5).score_threshold
: Specifies the threshold values for the detection confidence (default = 0.5). You may want to lower this value to increase the number of detections.nms_threshold
: Specifies the threshold value for non-maximal suppression (default = 0.4). You may want to lower this value to increase the number of detections.min_box_area
: Specifies the minimum value for area of detected bounding box. Calculated by \(width \times height\) (default = 200).track_buffer
: Specifies the threshold to remove track if track is lost for more frames than this value (default = 30).
Counting People Within Zones
It is possible to extend this use case with the Zone Counting use case. For example, if a CCTV footage shows the entrance of a mall as well as a road, and we are only interested to apply people counting to the mall entrance, we could split the video into 2 different zones and only count the people within the chosen zone. An example of how this can be done is given in the Tracking People within a Zone tutorial.