I’m writing a little post here about my final project for the Computer Vision and Image Processing class I took last semester. This project was inspired by a Hollywood-style computer interfaces (think Minority Report, or more recently, Avatar) where your computer is controlled by the movement of your hand. Rather than having someone wear a glove equipped with a sensor, I sought out to see how well I could track my hand with a camera.
This implementation uses OpenCV and Python. I used these bindings for Python, which I now realize are somewhat out of date. Still, I’m pretty satisfied with the results, given that it was developed over the course of a few weeks.
It works by first selecting “good features to track”, or points in the image where the largest changes in pixel values occur. So, this works best in front of a background that contrasts well with flesh tones. Then, it tracks the location of points as they change from frame to frame of video. Together, these two techniques are known as KLT tracking. You can read more about it, as well as read the original papers describing it here.
I put up my code on GitHub, though it is under commented and could use a great deal of optimization. Like a lot of my class projects, I think it represents a solid start, though. You’ll need the OpenCV libraries I linked before and Pygame in order to run it.
Here’s a video of me using it:
The left window shows the tracking. Here you can see me press “f” to find these “good features”, which are highlighted in green. Then, I press “r” to tell it that I intend on moving the object to track to the right. I move my hand to the right (my right, anyway), and then the most relevant points are highlighted in red. You can see that as I move, the tracker updates which points it finds the most relevant, discarding points that go against the grain of the average movement of my hand, and picking up new ones. The gray circle represents where the program thinks the center of my hand is.
The right window is a simple “game”. The object is to move the hollow blue circle on the green one, while avoiding the red one. I think this does well to demonstrate the level of precision the system has, although if I had more time, I would have done something more interesting.
It’s also possible to see some of the limitations of this approach. For one, you can only track one object per camera. That isn’t really a problem for what I was trying to do, as a final version of this should have two cameras positioned towards the hands. That way there’s no interference from moving your head or upper arms. You can see both confuse the tracker during the course of this video.
Another limitation is that although this tracks position well, it does nothing to recognize gestures. Gesture recognition could be implemented on top of point tracking as it exists now, although I know more specialized algorithms exist for that purpose. Either way, that’s beyond the scope of this.