Computer Vision: Motion Tracking with OpenCV and C++

Introduction

For our class project, we decided to explore motion tracking with a webcam. We settled on C++ as the language, with the fantastic OpenCV library handling most of the heavy lifting. Our goal for the motion tracker was to create something that would return smooth, simplified target data that would work well with a laser pointer turret (our next project.) Overall, we think we were successful, but there is still a lot of work to do to make this program truly robust.

This blog post will go over what specific problems we were trying to solve, what other people have done with motion tracking, what we tried to do, and what our results were.

The Problems and Their Solutions:

Getting data from a webcam and manipulate it in C++

The first problem we had to consider is, of course, how we would even talk to a webcam. Luckily, others have solved this particular issue and made it publicly available. OpenCV is open source and free for download at opencv.org.

Detecting areas of motion

Once we have the webcam feed, how do we figure out if something is moving on the screen? How do we figure out where its center of mass is? How can we reduce issues like noise and poor contrast? Once again, OpenCV had us covered for most of this, as it has robust tools for image processing.

The first area to consider is how to detect motion. There are several posted tutorials on this subject. It seems that the most common (and most simple) solution in to detect colors. If one assumes that the target object is red, for example, it is fairly trivial to find and track red objects. The inverse is also possible. If one assumes that the background is red, then objects of interest will be whatever is not red.

However, we wanted a much more general solution. Instead of looking for color, could we detect any changes between frames? This tutorial gave us a wonderful introduction to general motion detection, and we even ended up using some of the author’s code in our project.

The idea is to first grab two frames from the video feed, convert those to grayscale, and then use the function cv::absdiff() to get the absolute difference between those two frames. Simple as that, we now have a video feed showing only areas where pixels have changed between the two frames, aka where motion has occurred.

absdiff

Now that we have this, we need to convert it into data that a computer can easily understand. The first step in this process is to use cv::threshold() to convert the absolute difference into a binary output (pure black and pure white)

thresh1

Now we have a solid binary image of the areas of motion, but we can also see a lot of noise (small solitary white pixels and areas of minimal motion, like my glasses.) We don’t want all that stuff, just areas where there has been some sort of significant motion, like my hand. To get rid of this stuff, we use cv::blur() and another cv::threshold() to further simplify the image.

thresh2

And there we go! A nice, simple binary image that clearly shows motion.

Extracting useful coordinates we could send to a turret

Now that we have our binary image showing the areas of motion we need a way to accurately track individual objects. Our current implementation of object tracking is contained in the collectSamples() function. This function identifies contours in the image and stores them in a vector called contour_moments. We then collect the centroids of the contours into a vector of points called mass_centers. The x and y position coordinates are then averaged together to determine where the object we are tracking is.

Screen Shot 2015-10-21 at 9.44.36 PM

This method works when there is only one object being tracked. When something else is added to the mix the target our program draws finds a spot in the middle of all the moving objects. This problem can hopefully be fixed when we finish implementing a clustering algorithm.

One clustering algorithm we might implement is the hierarchical clustering algorithm. This algorithm is generally implemented in one two ways. The first is the agglomerative type where each data point as its own cluster and as one proceeds up a hierarchy of data points pairs of clusters join together. The second most commonly used type is the divisive method, where all of the data points start in one cluster and are split apart as the algorithm goes down through the hierarchy.

There is also the k-means clustering algorithm. This method takes a set of n observations and groups them into k clusters. The cluster an observation is placed in is dependent upon its distance from a one of the determined means. There is a kmeans() function included with opencv library. The kmeans() function takes in an inputArray of data, separates them into K clusters, and returns an outputArray of the cluster centers.

How do we deal with the laser?

This is something that came up during our presentation. Since we plan to mount a laser pointer to a turret in our next project, that laser would also be moving, and the program would track it. This would throw off the target coordinates significantly, especially for fast motions.

To potentially work around this issue, we added the speedGovernor() function. This function’s purpose is to simulate the actual motion of the turret, so that we can make a prediction about where the laser will be in the video feed and black it out of the threshold image.

Screen Shot 2015-10-21 at 9.47.10 PM

This function works using basic trigonometry. We find the distance between the target’s current position and its destination, as well as the angle between that distance (the hypotenuse) and the x axis. If the distance is greater than the constant we set, we can reduce the distance to that of the constant and find new x and y coordinates using the angle we found and the new distance.

There are numerous problems with this approach. First, we do not account for the rotational motion of the turret. Because of this, the turret’s apparent rate of motion will be different depending on where it is pointing. this could be accounted for fairly easily with a bit of calculus, but we have not implemented it at this stage of the project.

Secondly, this function is tied to the framerate of the video feed. This would be fine if framerate was constant, but as we all know, that is almost never the case. This problem will cause accruing errors while the program is running, until the laser is no longer blacked out and we lose accurate tracking. This is, again, an easy fix. We will simply have to tie the function to a clock instead of the frames.

Thirdly, and most importantly, are all of the other issues that come with robotics. There is inherent unpredictability when dealing with real world objects. Some of these will cause accruing errors, like the delay between sending a command to the turret and it executing the command. Others are wholly unpredictable, like the turret becoming obstructed. We will have to tackle this problem in our next project, and possibly come up with a new solution.

To see our full code: 

GitHub

Motion Tracking Robot

by Matthew, Laurin, Lucas

Summary: Last time we started out writing code utilizing OpenCV to track moving objects through image processing. This time we move our project into the physical world by making a robot that uses the code we wrote to track people with a laser.

What we did

Designing the turret

Our original idea in Project 1 was to make a turret in the style of the video game Portal. However, we quickly found that this was a non-trivial problem that many hobbyists and professional engineers have been working on for years. Since none of us have much in the way of engineering experience, we instead went with a utilitarian design.

turretpic

This was our OpenSCAD design. the box has enough space to comfortably house the webcam, Arduino Mega, and the breadboard we use for basic circuitry. Instead of making the box a single object, we made each section removable for rapid prototyping. The webcam is held securely in place with a back plate that is screwed into the front, and there are mounting holes+standoffs for the Arduino .

The actual turret is a very simple two-axis servo mount. The bottom servo controls horizontal rotation, and the top servo controls vertical. The laser itself is housed in the cylinder you see in the picture above. Additionally, the turret was designed so that the emitting end of the laser experiences no translation during normal operation. This was to simplify some of the math you’ll read about later.

Overall, the turret can be made very cheaply (around 30-40 dollars,) which would make it a good DIY project. We plan to upload the files to Thingiverse once we have made a statically linked executable for the control software.

A picture of the finished turret (under repair):

20151218_041008.jpg

And here is a video of the turret in action:

Finding Turret Position

In the end, the answer of where to point the laser was not to use calculus, but just to use more trig. We ended up saying that when the laser is pointed straight ahead, with both servos (up/down and left/right) at 90 degrees, the laser should be pointed at the very center of the image on our webcam (it turned out that that wasn’t the case. To get the base calculations though, we assumed that we were correct, and we dealt with fixing it later).

With our assumption, the calculations became fairly simple. We looked at the image from the webcam, and ignored everything in it but the center of the target we draw on it. The target already does all the calculations to put itself into the center of the object, so all we needed to do was figure out how to make the laser point at the center of the target.

What we ended up doing is we thought about the image as a cartesian grid, with the center of the image being our (0,0) and servos both at 90 degrees point. From there, we looked at where the target was on that grid, and calculated the angle to which both of our servos needed to go to to line up the laser. We did this through two arctangents. One servo needed to know how far left or right to go, so we could ignore the target’s height, project it into the axis we wanted, and solve the arctangent to get a degree offset from 90 the servo needed to go to. We did that for both axis, and ran both simultaneously on both servos, and effectively turned two 1-degree motions into a 2-degree motion that tracked actual movement in the real world with a laser pointer!

Using arctan in this way did have some problems, though. Such as arctans need to know how far away you are from the target, and we had no way of calculating that. So we assume a fixed distance that the object is away from the webcam. Also, we still had that off by a little bit error that was mentioned earlier. The laser, at default 90 degrees on both servos, did not point at the center of the webcam. The laser was mounted a few inches above the webcam, so we did have some offset.

How we ended up dealing with that problem was just more trig! In fact, just another arctan. Instead of using the location of the target, though, we used the already fixed distance to the object being tracked and the distance between the laser and webcam to calculate a fairly small angle offset that the up/down servo would need to take into account, in order to actually keep the laser in the target.

The math worked out to be this:

int x_command = -atan2((destination.x – configuration[“CAM_RES_X”] / 2)*parallax_unit , configuration[“PARALLAX”])*180/CV_PI + 90;

int y_command = atan2((destination.y – configuration[“CAM_RES_Y”] / 2)*parallax_unit , configuration[“PARALLAX”])*180/CV_PI + 90 + vertical_offset;

The commands, where the servo should rotate to, are pretty much identical, save that the Y command has the vertical offset added in as well, which is calculated elsewhere. The calculations are fairly simple, with the first parameter of atan2 being either the distance vertically or the distance horizontally that our target is off from the center, and the second being the hard coded distance the camera is from the target it’s tracking. The result is then multiplied by 180/pi, in order to convert from radians to degrees, then the base 90 degrees is added in, so that if the target is in the center of the screen, the servos would return to their proper default locations.

Sending the Data

So now that we have the calculations of where to move the servos, how do we get that data across to the Arduino? We had to figure out how to get our C++ code to send data through the serial port into the Arduino, so the Arduino could extract it and do meaningful things to the data.

We found that you can write data to a serial port by treating it as a file. Write your data to a stream, then send it along through the serial port. Really straightforward.

So the next problem was how to format the data being sent?

We decided to go with a single character, then two integer values. The single character could be an ‘A’, to indicate that the turret is moving automatically, and that the following integers are the location of the target, so the turret would move there. Or the character could be a ‘M’, to indicate that we have manual control, and we should treat the following stream differently. At the time of writing this, manual mode wasn’t fully working, though all of the architecture is in place to implement it.

The Arduino breaks apart the data, depending on the first character. If the first character is an ‘A’, it breaks apart the following two integers, and does the math on them as was discussed in the previous section, before telling the servos where to go.

GUI

 To make our project have a smoother user interface we worked on providing a GUI for our robot. Ultimately we used OpenGL with OpenCV since they seemed to be able to collaborate fairly well. First thing was first, we needed to get OpenCV working with OpenGL since we had not used OpenGL for our last project.

(We also utilized GLUT and GLEW, old I know. I got constant reminders from Xcode that my code was deprecated.)

To do this, we included our global variables at the top of the file and put our motion tracking openCV loop code (previously in main.cpp) into a function that I called opencvLoop(). OpenGL needed the main for itself, so it could call glutMainLoop() and update the window. I ended up calling opencvLoop in my main window display function, because I was using our images and binding them to texture objects.

The rest was simple and more straightforward because I used my graphics teacher’s code he provided us for educational purposes, to draw text to the screen and to have an easy-to-use shader program. Thank you Dr. Chappel!

All in all, I added keyboard arrow functionality, 3 debugging sub windows, our video, and instructions for our user in the GUI.

Screen Shot 2015-12-14 at 1.48.59 PM

See our code here:

GitHub