Introduction
For our class project, we decided to explore motion tracking with a webcam. We settled on C++ as the language, with the fantastic OpenCV library handling most of the heavy lifting. Our goal for the motion tracker was to create something that would return smooth, simplified target data that would work well with a laser pointer turret (our next project.) Overall, we think we were successful, but there is still a lot of work to do to make this program truly robust.
This blog post will go over what specific problems we were trying to solve, what other people have done with motion tracking, what we tried to do, and what our results were.
The Problems and Their Solutions:
Getting data from a webcam and manipulate it in C++
The first problem we had to consider is, of course, how we would even talk to a webcam. Luckily, others have solved this particular issue and made it publicly available. OpenCV is open source and free for download at opencv.org.
Detecting areas of motion
Once we have the webcam feed, how do we figure out if something is moving on the screen? How do we figure out where its center of mass is? How can we reduce issues like noise and poor contrast? Once again, OpenCV had us covered for most of this, as it has robust tools for image processing.
The first area to consider is how to detect motion. There are several posted tutorials on this subject. It seems that the most common (and most simple) solution in to detect colors. If one assumes that the target object is red, for example, it is fairly trivial to find and track red objects. The inverse is also possible. If one assumes that the background is red, then objects of interest will be whatever is not red.
However, we wanted a much more general solution. Instead of looking for color, could we detect any changes between frames? This tutorial gave us a wonderful introduction to general motion detection, and we even ended up using some of the author’s code in our project.
The idea is to first grab two frames from the video feed, convert those to grayscale, and then use the function cv::absdiff() to get the absolute difference between those two frames. Simple as that, we now have a video feed showing only areas where pixels have changed between the two frames, aka where motion has occurred.
Now that we have this, we need to convert it into data that a computer can easily understand. The first step in this process is to use cv::threshold() to convert the absolute difference into a binary output (pure black and pure white)
Now we have a solid binary image of the areas of motion, but we can also see a lot of noise (small solitary white pixels and areas of minimal motion, like my glasses.) We don’t want all that stuff, just areas where there has been some sort of significant motion, like my hand. To get rid of this stuff, we use cv::blur() and another cv::threshold() to further simplify the image.
And there we go! A nice, simple binary image that clearly shows motion.
Extracting useful coordinates we could send to a turret
Now that we have our binary image showing the areas of motion we need a way to accurately track individual objects. Our current implementation of object tracking is contained in the collectSamples() function. This function identifies contours in the image and stores them in a vector called contour_moments. We then collect the centroids of the contours into a vector of points called mass_centers. The x and y position coordinates are then averaged together to determine where the object we are tracking is.
This method works when there is only one object being tracked. When something else is added to the mix the target our program draws finds a spot in the middle of all the moving objects. This problem can hopefully be fixed when we finish implementing a clustering algorithm.
One clustering algorithm we might implement is the hierarchical clustering algorithm. This algorithm is generally implemented in one two ways. The first is the agglomerative type where each data point as its own cluster and as one proceeds up a hierarchy of data points pairs of clusters join together. The second most commonly used type is the divisive method, where all of the data points start in one cluster and are split apart as the algorithm goes down through the hierarchy.
There is also the k-means clustering algorithm. This method takes a set of n observations and groups them into k clusters. The cluster an observation is placed in is dependent upon its distance from a one of the determined means. There is a kmeans() function included with opencv library. The kmeans() function takes in an inputArray of data, separates them into K clusters, and returns an outputArray of the cluster centers.
How do we deal with the laser?
This is something that came up during our presentation. Since we plan to mount a laser pointer to a turret in our next project, that laser would also be moving, and the program would track it. This would throw off the target coordinates significantly, especially for fast motions.
To potentially work around this issue, we added the speedGovernor() function. This function’s purpose is to simulate the actual motion of the turret, so that we can make a prediction about where the laser will be in the video feed and black it out of the threshold image.
This function works using basic trigonometry. We find the distance between the target’s current position and its destination, as well as the angle between that distance (the hypotenuse) and the x axis. If the distance is greater than the constant we set, we can reduce the distance to that of the constant and find new x and y coordinates using the angle we found and the new distance.
There are numerous problems with this approach. First, we do not account for the rotational motion of the turret. Because of this, the turret’s apparent rate of motion will be different depending on where it is pointing. this could be accounted for fairly easily with a bit of calculus, but we have not implemented it at this stage of the project.
Secondly, this function is tied to the framerate of the video feed. This would be fine if framerate was constant, but as we all know, that is almost never the case. This problem will cause accruing errors while the program is running, until the laser is no longer blacked out and we lose accurate tracking. This is, again, an easy fix. We will simply have to tie the function to a clock instead of the frames.
Thirdly, and most importantly, are all of the other issues that come with robotics. There is inherent unpredictability when dealing with real world objects. Some of these will cause accruing errors, like the delay between sending a command to the turret and it executing the command. Others are wholly unpredictable, like the turret becoming obstructed. We will have to tackle this problem in our next project, and possibly come up with a new solution.
To see our full code: