Intelligence: Man vs. Machine

I was interviewed for this article:

Intelligence: Man vs. Machine

Here’s the full text of the email interview I submitted for this article:

Could you tell me more about your personal experience of the Prize? (your grades, your relations to the other competitors, the discussions with the judges, and any other anecdote you think is interesting)

Chip Vivant, my chatbot, was a finalist in both the 2009 and 2011 Loebner Prize Competitions. (I haven’t updated my website,, to reflect my participation in the 2011 contest yet.) In 2009, I finished in third (last) place due to a glitch in my program – it was the first-ever time I qualfied. In 2011, in Exeter, I finished in 3rd place, but did so after having beaten nine other entries in the prequalifying rounds.

For most of the contest’s history, the contest has consisted of two phases: a prequalifying round, then the actual competition. Screening questions are used in the prequalifying round to select the four “best” entries, and these four entries go on to compete in the final competition. The interesting aspect of this is that the questions asked in the prequalifying round can be very different than those asked during the actual competition. The questions asked in the prequalifying round are very purposeful and many are designed to weed out the chatbots that rely solely on having a large database of canned responses. An example question from this round would be “Which is larger, an automobile or a grape?”

The actual competition questions are more random to me, more along the lines of an actual freeform conversation, but this of course varies from one judge to another.

From the very beginning, I decided that I wasn’t going to have a large database of canned responses and would try to explore areas that require real reasoning. My thesis is that it’s ridiculously easy to spot the bot – I can train anyone to do this in fifteen minutes – and therefore, I wanted to use this competition as a springboard for trying out new ideas, knowing that my chances of actually winning the contest with this mentality are slim-to-none.

The first competition I participated in, which was in England in 2009. I met Rollo Carpenter, creator of Cleverbot and Jabberwacky, whom I found very nice and personable. I also met David Hamill who thinks the same way as I do about chatbots and with whom I had enjoyed talking on the Robitron Yahoo Groups forum. I also met David Levy, my childhood hero, who was the author of the Intelligent Two-Player Games series in Creative Computing Magazine in the early 80s, and who inspired my love of Artificial Intelligence. It was an honor to meet these people.

I’ve also enjoyed my encounters with Hugh Loebner, whom I’ve visited twice at Crown Enterprises in New Jersey in order to test Chip Vivant and twice again at the actual competitions. He’s quite a character and I like him a lot.

Developing Chip Vivant is a hobby of mine and I’m not currently deriving income from this line of work, although I hope that will eventually change. The things I’ve accomplished with Chip, the people I’ve met, and the experiences I’ve had participating in these contests have been well worth it, though.

Did your participation to the Prize change something in the way you perceive artificial intelligence and its use?

Yes and no. On the one hand, I’ve become keenly aware of the futility of creating a program which comes anywhere close to fooling someone who knows what they’re doing. I’ve also become keenly aware of the overwhelming amount of unwritten, undocumented common knowledge which we’ll somehow need to codify if we’re ever to win the Loebner Prize or any other Turing Test. Case in point: during my development of Chip, I came to realizations like there’s no non-human knowledge source anywhere that an extraterrestrial landing on Earth could use to find out whether an apple is smaller or larger than the moon. Try Googling for this answer. Try finding this answer in any book or library, anywhere. You’ll come up dry. And yet some laypeople think that a Turing-Test-winning program is just around the corner. The chasm between chatbot writers who are acutely aware of this and laypeople who aren’t is very large.

That said, I’m equally astonished by the unexploited low-hanging-fruit-type opportunities which abound in this field. In the 1960s, the ELIZA program with its simple keyword-spotting parlor tricks managed to dupe people and give them comfort. Imagine how much more we could accomplish along these lines with the technology at our disposal today.

When do you think the Grand Prize will be won by a program? In 5, 10, 15 years?

I can say categorically that this will not happen in my lifetime, and I estimate I’ve got a good 35-40 years left if I don’t get hit by a bus. The fact that four years after I’ve revealed this, you still can’t Google for or find codified knowledge for things like whether an apple is larger or smaller than the moon means that we’ve got light years to go before we crack this code.

Also, ironically, I don’t think the Loebner Prize Contest rewards the innovation that would be needed to achieve this result, but this isn’t Hugh’s fault. Here is the analysis that I posted on the Robitron discussion group but hasn’t made its way to my website yet. (Again, note that my opinions changed after the 2012 LPC.)

– Marvin Minsky et. al. and many others think the LPC [Loebner Prize Contest] is a joke. Stuart Shieber wrote an essay on why. (This is all on Hugh’s contest homepage:

– Hugh wrote an eloquent rebuttal to this and I agree with everything he says in that rebuttal.

– Going into the contests for the last three years, I’ve naively believed that a SHRDLU-like entry would win the LPC because it would represent breathtaking technological prowess and not just a bunch of canned responses. I hyperfocused on Chip’s winning fourth conversation in Brighton [in 2009] where despite outing himself, he was ranked higher than his competitors because the conversation was of higher quality.

– My LPC 2011 experiences confirm my naivety and that this is definitely not the case. Many judges came out guns-ablazin’ and despite my being like a dog who submits immediately to avoid being attacked by the alpha dog, they attacked anyway, peppering Chip with questions he couldn’t possibly answer because he had no fake backstory and refusing to answer most of Chip’s questions. The few that did answer Chip’s questions and had higher-quality conversations failed to be moved sufficiently.

– How do I reconcile this with my belief that Hugh was correct about the theoretical validity of the LPC to reward new technology by rewarding the best bot? After a lot of soul-searching I realized that just like the content of religious scriptures can have absolutely no bearing on the societal manifestation of that religion at any given moment in time, the societal manifestation of the LPC at this moment in time is nothing more than a glorified creative writing contest once the initial “hazing” (to use Bruce Wilcox’s term) of the preselection round has been completed. Comparing Zoe’s and Chip’s prequalifier transcripts is one indication of this: Chip got more answers correct, but Zoe’s were way more human.

Let me preemptively say before people come out of the woodwork and start posting examples about how Chip sucks that I’m not saying that Chip was better than the other bots or succeeded in the stated goals I had for him – I’m just saying that my convictions about the societal manifestation of the LPC in the present day were solidified during this year’s contest. I’m also not saying that the outcome was undeserved: the power and expressiveness of ChatScipt in pattern matching lend themselves very well to authoring the sort of Choose Your Own Adventures needed to win the LPC, especially the ability to author multiple levels of patterns and responses, and such technology deserves to be rewarded IMO. (Non-U.S. readers, see: I also hadn’t realized Zoe attempted to handle non-trivial things like size relationships – the prequalifier transcripts for Zoe (and also for Chip) didn’t reveal this. In the end, however, any prequalifier-like abilities paled in comparison to the ability to generate high-quality human-like canned responses – I saw it with my own eyes.

Finally, I should add that the whole notion of whether the Turing Test is the Gold Standard of AI testing is higly debatable. The contest will always favor real humans because their sensory inputs and bodies are “more human”, but this has nothing to do with intelligence. The Wikipedia article on the Turing Test has a nice section on criticisms of this test:

That said, if a computer does eventually win this, it will rightfully qualify as “intelligent” in my book.

What is your definition of human intelligence? And what is the difference with artificial intelligence?

The only extra stuff I would add to “human intelligence” on top of any general definition of intelligence is that humans have sensory inputs that are “human”. I can’t even begin to fathom the man-years of coding and simulation that would be needed to model the feeling I have in my stomach and mind when I’ve got a hunger pang. Computers will need this if they’re ever to pass the Turing Test with a semi-intelligent interrogator, but hyperfocusing on stuff like this seems like a lesson in futility compared to all the other neat stuff we could accomplish AI-wise with that same effort.

When I think of Artificial Intelligence, I think of a good utilitarian definition I heard ages ago which has stuck with me to this day: a program can be said to be artifically intelligent if it can solve problems that would require intelligence if solved by a human. Pure and simple. Note that there’s nothing in that definition that says how the program must come up with the solution.

On an emotional level, the first thing that comes to my mind is my experience with the triangular peg solitaire puzzle. It’s the triangle puzzle with pegs where you have to jump one peg over another, removing one peg each time, until you have only one peg left. When I was 14, I couldn’t solve that puzzle on my own, so I wrote a computer program to solve it. The program was able to take its inputs, churn through them in ways that I wasn’t able to, then come up with a solution. It was a thrilling moment for me: even though I wasn’t able to ask that program what its favorite sport was or how it felt when it was hungry, it still was capable of “thinking through” a complex problem and giving me an answer.

Like it? Share it!Share on FacebookTweet about this on TwitterShare on LinkedInShare on RedditShare on StumbleUponDigg this

Leave a Comment