model.mtcnn

Description

🔲 Multi-task Cascaded Convolutional Networks for face detection. Works best with unmasked faces.

class Node(config=None, **kwargs)[source]

Initializes and uses the MTCNN model to infer bboxes from an image frame.

The MTCNN node is a single-class model capable of detecting human faces. To a certain extent, it is also capable of detecting bounding boxes around faces with face masks (e.g. surgical masks).

Inputs

img (numpy.ndarray): A NumPy array of shape \((height, width, channels)\) containing the image data in BGR format.

Outputs

bboxes (numpy.ndarray): A NumPy array of shape \((N, 4)\) containing normalized bounding box coordinates of \(N\) detected objects. Each bounding box is represented as \((x_1, y_1, x_2, y_2)\) where \((x_1, y_1)\) is the top-left corner and \((x_2, y_2)\) is the bottom-right corner. The order corresponds to bbox_labels and bbox_scores.

bbox_scores (numpy.ndarray): A NumPy array of shape \((N)\) containing confidence scores \([0, 1]\) of detected objects. The order corresponds to bboxes and bbox_labels.

bbox_labels (numpy.ndarray): A NumPy array of shape \((N)\) containing strings representing the labels of detected objects. The order corresponds to bboxes and bbox_scores.

Configs

weights_parent_dir (Optional[str]) – default = null.
Change the parent directory where weights will be stored by replacing null with an absolute path to the desired directory.
min_size (int) – default = 40.
Minimum height and width of face in pixels to be detected.
scale_factor (float) – [0, 1], default = 0.709.
Scale factor to create the image pyramid. A larger scale factor produces more accurate detections at the expense of inference speed.
network_thresholds (List[float]) – [0, 1], default = [0.6, 0.7, 0.7].
Threshold values for the Proposal Network (P-Net), Refine Network (R-Net) and Output Network (O-Net) in the MTCNN model.

Calibration is performed at each stage in which bounding boxes with confidence scores less than the specified threshold are discarded.
score_threshold (float) – [0, 1], default = 0.7.
Bounding boxes with confidence scores less than the specified threshold in the final output are discarded.

References

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks: https://arxiv.org/ftp/arxiv/papers/1604/1604.02878.pdf

Model weights trained by https://github.com/blaueck/tf-mtcnn

Changed in version 1.2.0:
mtcnn_min_size is renamed to min_size.
mtcnn_factor is renamed to scale_factor.
mtcnn_thresholds is renamed to network_thresholds.
mtcnn_score is renamed to score_threshold.