“Hello Computer Vision”
Computer Vision (or CV) is a field in AI that develops techniques to help computers to “see” and “understand” the contents of digital images like photographs and videos, and to derive meaningful information. Common CV applications include object detection to detect what objects are present in the image and pose estimation to detect the position of human limbs relative to the body.
PeekingDuck allows you to build a CV pipeline to analyze and process images and/or videos. This pipeline is made up of nodes: each node can perform certain CV-related tasks.
This section presents two basic “hello world” examples to demonstrate how to use PeekingDuck for pose estimation and object detection.
Pose Estimation
To perform pose estimation with PeekingDuck, initialize a new PeekingDuck project using the following commands:
Terminal Session
peekingduck init will prepare the pose_estimation
folder for use with
PeekingDuck.
It creates a default pipeline file called pipeline_config.yml
and a src
folder
that will be covered in the later tutorials.
The pipeline_config.yml
file looks like this:
1nodes:
2- input.visual:
3 source: https://storage.googleapis.com/peekingduck/videos/wave.mp4
4- model.posenet
5- draw.poses
6- output.screen
The above forms a pose estimation pipeline and it comprises four nodes that do the following:
input.visual
: reads the filewave.mp4
from PeekingDuck’s cloud storage
model.posenet
: runs the PoseNet pose estimation model on it
draw.poses
: draws a human pose skeleton over the person tracking his hand movement
output.screen
: outputs everything onto the screen for display
Now, run the pipeline using
Terminal Session
You have successfully run a PeekingDuck pose estimation pipeline!
Object Detection
To perform object detection, initialize a new PeekingDuck project using the following commands:
Terminal Session
Then modify pipeline_config.yml
as follows:
1nodes:
2- input.visual:
3 source: https://storage.googleapis.com/peekingduck/videos/wave.mp4
4- model.yolo
5- draw.bbox
6- output.screen
The key differences between this and the earlier pipeline are:
Line 4:model.yolo
runs the YOLO object detection modelLine 5:draw.bbox
draws the bounding box to show the detected person
Run the new object detection pipeline with peekingduck run.
You will see the same video with a bounding box surrounding the person.
That’s it: you have created a new object detection pipeline by changing only two lines!
Note
Try replacingwave.mp4
with your own video file and run both models.For best effect, your video file should contain people performing some activities.
Using a WebCam
If your computer has a webcam attached, you can use it by changing the first
input
node (line 2) as follows:
1nodes:
2- input.visual:
3 source: 0 # use webcam for live video
4- model.posenet # use pose estimation model
5- draw.poses # draw skeletal poses
6- output.screen
Now do a peekingduck run and you will see yourself onscreen. Move your hands around and see PeekingDuck tracking your poses.
To exit, click to select the video window and press q.
Note
PeekingDuck assumes the webcam is defaulted to input source 0. If your system is configured differently, you would have to specify the input source by changing the
input.visual
configuration. See changing node configuration.
Pipelines, Nodes and Configs
PeekingDuck comes with a rich collection of nodes that you can use to create your own CV pipelines. Each node can be customized by changing its configurations or settings.
To get a quick overview of PeekingDuck’s nodes, run the following command:
Terminal Session
You will see a comprehensive list of all PeekingDuck’s nodes with links to their
readthedocs
pages for more information.
PeekingDuck supports 6 types of nodes:
A PeekingDuck pipeline is created by stringing together a series of nodes that perform a logical sequence of operations. Each node has its own set of configurable settings that can be modified to change its behavior. An example pipeline is shown below:
Bounding Box vs Image Coordinates
PeekingDuck has two \((x, y)\) coordinate systems, with top-left corner as origin \((0, 0)\):
- Absolute image coordinates
For an image of width \(W\) and height \(H\), the absolute image coordinates are integers from \((0, 0)\) to \((W-1, H-1)\).
E.g., for a 720 x 480 image, the absolute coordinates range from \((0, 0)\) to \((719, 479)\).
- Relative bounding box coordinates
For an image of width \(W\) and height \(H\), the relative image coordinates are real numbers from \((0.0, 0.0)\) to \((1.0, 1.0)\).
E.g., for a 720 x 480 image, the relative coordinates range from \((0.0, 0.0)\) to \((1.0, 1.0)\).
This means that in order to draw a bounding box onto an image, the bounding box relative coordinates would have to be converted to the image absolute coordinates.
Using the above figure as an illustration, the bounding box coordinates are given as \((0.18, 0.10)\) top-left and \((0.52, 0.88)\) bottom-right. To convert them to image coordinates, multiply the x-coordinates by the image width and the y-coordinates by the image height, and round the results into integers.
Thus, the image coordinates are \((130, 48)\) top-left and \((374, 422)\) bottom-right.
Note
The model
nodes return results in relative coordinates.