In a recent post on the BERG blog, Gardens and Zoos, Matt Jones explored a series of ideas for designing personality and life into technology products. One of the most compelling of these takes advantage of pareidolia, the natural human inclination to see faces everywhere around us.
Jones's slide introducing pareidolia.
Jones advocates designing faces into new technology products as a way of making them more approachable, using pareidolia to give products personality and humanize them without climbing all the way down into the Uncanny Valley. He even runs a Flickr group collecting images of pareidolia-inducing objects: Hello Little Fella!
Lately I've been thinking a lot about faces. I've had mine scanned and turned it into a digital puppet. I've been working extensively with face tracking, building a series of experiments and prototypes with Kyle McDonald's ofxFaceTracker, an OpenFrameworks frontend to Jason Saradigh's excellent FaceTracker project. Most publicly so far, I demonstrated that FaceTracker can track hand-drawn faces.
Facial recognition techniques give computers their own flavor of pareidolia. In addition to responding to actual human faces, facial recognition systems, just like the human vision system, sometimes produce false positives, latching onto some set of features in the image as matching their model of a face. Rather than the millions of years of evolution that shapes human vision, their pareidolia is based on the details of their algorithms and the vicissitudes of the training data they've been exposed to.
Their pareidolia is different from ours. Different things trigger it.
Face in the Window. FaceTracker seeing a face in a window at CMU's Studio for Creative Inquiry during Art && Code.
After reading Jones's post, I came up with an experiment designed to explore this difference. I decided to run all of the images from the Hello Little Fella Flickr group through FaceTracker and record the result. These images induce pareidolia in us, but would they do the same to the machine?
Using the Flickr API, I pulled down 681 images from the group. I whipped up an OpenFrameworks app that loaded each image and passed it to FaceTracker for detection, saving an image of the resulting face if it was detected. The result was that FaceTracker detected a face in 50 of the images, or about 7%.
When I looked through the results I found that they broke down into three different categories in terms of how the face detected by the software related to the face that a person would see in the photo: agreement, near agreement, and totally other. Each of these categories reveals a different possible relationship between the human vision system and the software vision system. Significantly I also found that I had a different emotional reaction to each of these types of results. I think the spectrum of possibilities outlined by these three categories is one we're going to see a lot as we find ourselves surrounded by more and more designed objects that are embedded with computer vision. At the end of this post I'll share some ideas about the repercussions this might have for the design of the Robot-Readable World, both for the robots themselves and the things we create for them to look at.
But first a little more about each of the categories.
Agreement happens when the face tracking system detects exactly the part of the scene that originally induced pareidolia in the photographer, inspiring them to take the photo in the first place. In many ways these are the most satisfying results. They give you the confirming feeling that YES it saw just what I saw. Here are some results that show Agreement:
This one is rather good. I hadn't really even been able to see the face in this cookie until the app showed it to me.
I think this one is especially exciting because there's an inductive implication that it could see all of these:
One major ingredient of Agreement seems to be a clearly defined boundary around the prospective face's features. I discovered something similar when experimenting with getting FaceTracker to see hand-drawn faces.
The next category is Near Agreement. Near Agreement takes place when some — but not all — facial features the algorithm picks out match those a human eye would see.
For example, here's a case where it sees the same eyes as I do, but we disagree about the nose and mouth.
I see the black hole there as the mouth of the little fella. The algorithm sees that as his nose and the shift in the reflection below that as the mouth.
When these kinds of Near Agreements occur I find myself going through a quick series of emotions. Excitement: it sees it! Let down: oh, but that's not quite it. Empathy: you were so close; just a little to left, I see where you went wrong...
Got the mouth right, but the eyes were just a little too far out of reach:
The back of this truck I actually find quite compelling. I think the original photographer was thinking of arrows at the top as the eyes and the circular extrusion as the border of the face. But now, having seen the face that the algorithm detected, I can actually see that face more clearly than the one I think the photographer intended.
This last category is the one I find the most fascinating. Sometimes FaceTracker would detect a face in a part of the image totally separate from the face the image was intended to capture. Something in that portion of the image, which frequently looked like an undifferentiated portion of some surface, or a bit of seemingly meaningless detail, triggered the system's pattern for a face.
These elicit the most complex emotional response of all. It starts off with "huh?", a sense of mystification about what the algorithm could be responding to. Then there's a kind of aesthetic of the glitch. "Oh it's a screw up, how funny and slightly troubling". But then finally, the more of these I saw, the more the effect started to feel truly other: like a coherent, but alien idea of what faces were. It made me wonder what I was missing. "What is it seeing there?" It's a feeling akin to having a conversation with someone who's gradually losing interest in what you're saying and starting to scan the room over your shoulder.
You can see the rest of the 50 photos in my Machine Pareidolia set on Flickr.
So what can we learn from these results? Let's return to Mr. Jones for a moment. He explained his interest in human pareidolia thusly:
One of the prime materials we work with as interaction designers is human perception. We try to design things that work to take advantage of its particular capabilities and peculiarities.
As designers of the Robot-Readable World we need to have a similar sense of the capabilities and peculiarities of this new computational perception. Hopefully this experiment can give us some sense of the texture of that perception, an idea of how much of its circle overlaps with ours in the venn diagram of vision systems and how the non-overlapping parts look and behave.