Kinect Experiments: compositing and 3d

28 November, 2010

So, I've gone a little Kinect crazy. Ever since the open source community was able to figure out the USB protocol and start working on a set of drivers and libraries for it (libfreenect) I've been following the progress closely and scheming about making my own projects with it.

Thankfully, the progress in the open source community has been absolutely astounding and within a week of the Kinect's launch there were already libraries available for working with it in Processing, Open Frameworks, and Cinder amongst other platforms.

A couple of weeks ago I hit the point where I could no longer restrain myself and so I ran out to Best Buy and bought myself a Kinect. Pretty much immediately, I got the Processing hello world example working. Here's the required hello world screen shot:

But what do I really want to do with the Kinect? The two main areas I'm interested in exploring are live compositing (combining two images to create the illusion of a coherent scene) and 3d printing (capturing data about the 3d shape of objects in order to print small versions of them and to use them in 3d animations). Towards those ends, I wrote two basic Processing sketches that use the Kinect to begin to explore these areas.

The first one uses the depth image that comes from the Kinect and allows you to select a particular depth within the image by clicking. It then replaces all the parts of the image at that depth with a pre-loaded static image (in this case, a random picture of a shoe from the internet). In this example video I demonstrate the possibilities for using this technique for both background replacement (replacing the wall behind me with the shoe) and in-scene compositing (replacing the box on the table in front of me with the shoe).

Background Replacement with Kinect and Processing from Greg Borenstein on Vimeo.

If you watch the logging below the image you can also see that I'm calculating the exact distance to the object targeted for replacement in inches and increasing and decreasing a threshold for the pixels we want to replace. The full code for this example is here:

Obviously, it will eventually be desirable to do this with live video instead of just a static image, to smooth out the mask being created from the noisy depth image the Kinect provides, and to use the Kinect's rgb image as part of the composite rather than just the grayscale depth image. All of these features are things I'm working towards that are held up by various factors ranging from the need to learn Cinder in order to rewrite this example in C++ for performance to waiting for the authors of the open kinect libraries to implement the math for aligning the Kinect's RGB pixels with its depth image

Creating 3d models from the Kinect depth data is a greater challenge than basic compositing. It involves translating the 255 values of depth data that the Kinect provides into accurately scaled physical measurements and then using those to translate the positions of pixels in the two-dimensional plane into an arrangement in 3d space.

Having never worked with 3d in Processing before and not really fully understanding the math involved in these kinds of transformations, it took me a while to come up with code that even generated a 3d point cloud and even then I was never able to overcome some of the distortions that come from the naive math I applied. Hence, my 3d examples are not as impressive as some of the more sophisticated things people are doing out there.

Here was my first attempt:

Live 3D Point Cloud with Kinect in Processing from Greg Borenstein on Vimeo.

Obviously, in that example, the 3d space is somewhat flattened. After some work, I managed to produce a 3d rendering that while still wildly distorted at depth, was at least significantly more 3d:

Kinect 3D Point Cloud in Processing with improved projection from Greg Borenstein on Vimeo.

The code for this is here:

More than a week has past since I produced these two demos and in that time I've made some progress working with Cinder and doing some experiments with using the Kinect for projection mapping. I'll write those up soon once they're a little more fleshed out.