Google's computer vision software was attempting to evaluate my facial expressions, and to then infer my emotional state. In order for the machine to have a fighting chance, the menu of emotions was limited to surprise, sorrow, anger, or joy.
The Googlers presiding over the demonstration urged a fellow attendee and me, to be surprised, sad, angry, and happy for the camera. The images of our grimaces and grins were then processed by the Cloud Vision API in an effort to identify our expressions. Were it a casting call, neither of us would have landed the role.
So why such slow progress, for such a long time? The short answer is that this problem is really, really hard. A more subtle answer is that we really don't understand what intelligence is (at least being able to define it with specificity), and so that makes it really hard to program.The results were OK. The algorithm correctly detected the joy we were faking. But it mistook my attempt to feign surprise as more glee. My sad and angry faces left the algorithm uncertain and unable to render a decisive verdict on my theatrics.
In some ways we are living in the future. This Intertubes thingie is outstanding, and videoconferencing and Telepresence are cool. But no AI or flying cars. Or Moon cities. And the article ends with some good advice:
As a development tool, the Cloud Vision API is a marvel. Being able to make a few API calls and identify an image has many potential applications. Some of them may even be useful. But the suggestion put forth by the Emotobooth minders, that companies could use the technology to evaluate customer sentiment, might not be the best way to engage with people.Go home, AI. You're drunk,