My Loebner Prize Contest 2011 Reflections

Posted by Mohan Embar on October 23, 2011 Leave a comment (0) Go to comments

(This post is written a year after the fact, but better late then never.)

After the disappointment of being disqualified from the 2010 Loebner Prize Competition due to a bug in my implementation of the Loebner Prize Protocol, I took solace in the fact that I had extra time to develop Chip and iron out the horrible bug which caused him to choke in three of the four conversations in Brighton. I also had time to flesh out Chip’s ability to model the user and thereby ask relevant questions about the user.

The competition was to be held in Exeter, UK, at the University of Exeter. I was excited that unlike 2009, where Chip was in the top “four” of three entries, this year Chip came in the top four of 13 entries. I had debated on whether I wanted to make a trip to the UK, and couldn’t see myself incurring that expense. But then I called my mom and asked her whether or not she wanted to go sightseeing / attend the competition in exchange for my promising to ogle all of the museums and monuments she wanted. She graciously accepted the offer.

The plus side of the trip is that I got to spend a lot of quality time with my mom. I left home to attend university when I was sixteen, having skipped a grade, and have since never worked close to home, so this was the first time that we spent so much time together in a long while. I also made it a point to try to not to work on Chip until the very last minute (which I could afford to do since I was bringing my own machine), but only succeeded partially at that.

Before the trip, I was contacted by Miranda Yousef (Twitter: @smartbotfilm), an independent filmmaker doing a documentary on the Loebner Prize Competition. She had wanted to interview me here in Wisconsin, but wasn’t able to make in. She did want me to make time for her during my stay in Exeter, though.

The day before the competition, I met up with Ed Keedwell as well as Miranda and her sister Odette. Ed and the other contest organizers were nice and hospitable and walked me through their configuration. Hugh was also there. In addition to setting up, I was filmed for a couple of hours by Miranda and her crew: doing an hourlong interview outside the university cafeteria. The university grounds and surroundings were beautiful.

During the interview, I reiterated my position that I wanted Chip to win this contest by outing himself right away and not doing the tiresome charade of having a fake backstory and a ton of canned responses. but rather listening to and engaging the user and learning about him or her.

On the day of the contest, I met David Levy, my childhood hero, again, who I had last gone toe-to-toe with in Brighton in 2009, as well as Rollo Carpenter. It was nice to see them again and introduce them to my mom. It was also nice to see Hugh again: I like him a lot.

The competition, was another disappointment, however. Chip didn’t come in dead last, but didn’t fare very well either. There was an annoying bug where Chip would say “I didn’t hear you” every time a judge pressed [Return] without typing anything. More annoyingly, whenever a judge asked a question which Chip couldn’t understand, Chip would respond with his own itinerary of questions, which irritated the judges. Most annoyingly, however, some the judges refused to answer most of Chip’s questions, insisting that they were the ones who should be asking the questions. And of course, there were several occasions where the judge did answer one of Chip’s questions, but Chip failed to understand the result.

I’m not sure whether this had anything to do with it, but I realized later that several of the questions in Chip’s opening salvo (How old are you? Do you have children?) are off-putting to Brits, which may or may not have set the tone for the remainder of the conversation.

The defeat stung, despite my having had a nice interview with Miranda and Odette and my mom reassuring me that being able to see and participate in this part of my world made everything more than worth it. Which was true.

Later, back home, Robby Garner asked this question on Robitron:

Preparing for a Loebner contest, there are lots of things to watch out for. But the judges only have one shot at getting along with your program. How do you manage simulating that over and over as you prepare for the big show?

This opened the floodgates. Here was my response. (Note that my opinions changed after the 2012 contest results.)

Chip fails at it because he insists on outing himself right away and having no fake backstory, hoping in vain that the judges will indulge him afterwards and exercise the functionality that Chip would like to showcase.

I’ve been repeatedly warned that this is not the way to go with the LPC, but I did it again anyway. That said, there were a string of disappointing moments where Chip failed to handle things he should have been able to handle. And for some reason, I had another LPP issue where I kept getting what I thought were blank lines and saying “I didn’t hear you” constantly. I’m pretty sure the absence of those deficiencies wouldn’t have tipped the scales, though.

I’m going to follow through on posting an essay on my reflections about LPC 2011 (I never did this), but the summary of that essay will be this:

– Marvin Minsky et. al. and many others think the LPC is a joke. Stuart Shieber wrote an essay on why. (This is all on Hugh’s contest homepage.)

– Hugh wrote an eloquent rebuttal to this and I agree with everything he says in that rebuttal.

– Going into the contests for the last three years, I’ve naively believed that a SHRDLU-like entry would win the LPC because it would represent breathtaking technological prowess and not just a bunch of canned responses. I hyperfocused on Chip’s winning fourth conversation in Brighton where despite outing himself, he was ranked higher than his competitors because the conversation was of higher quality.

– My LPC 2011 experiences confirm my naivety and that this is definitely not the case. Many judges came out guns-ablazin’ and despite my being like a dog who submits immediately to avoid being attacked by the alpha dog, they attacked anyway, peppering Chip with questions he couldn’t possibly answer because he had no fake backstory and refusing to answer most of Chip’s questions. The few that did answer Chip’s questions and had higher quality conversations failed to be moved sufficiently.

– How do I reconcile this with my belief that Hugh was correct about the theoretical validity of the LPC to reward new technology by rewarding the best bot? After a lot of soul-searching I realized that just like the content of religious scriptures can have absolutely no bearing on the societal manifestation of that religion at any given moment in time, the societal manifestation of the LPC at this moment in time is nothing more than a glorified creative writing contest once the initial “hazing” (to use Bruce Wilcox’s term) of the preselection round has been completed. Comparing Zoe’s and Chip’s prequalifier transcripts is one indication of this: Chip got more answers correct, but Zoe’s were way more human.

Let me preemptively say before someone comes out of the woodwork and starts posting examples about how Chip sucks that I’m not saying that Chip was better than the other bots or succeeded in the stated goals I had for him – I’m just saying that my convictions about the societal manifestation of the LPC in the present day were solidified during this year’s contest. I’m also not saying that the outcome was undeserved: the power and expressiveness of ChatScipt in pattern matching lend themselves very well to authoring the sort of Choose Your Own Adventures needed to win the LPC, especially the ability to author multiple levels of patterns and responses, and such technology deserves to be rewarded IMO. (Non-U.S. readers, see: http://en.wikipedia.org/wiki/Choose_Your_Own_Adventure.) I also hadn’t realized Zoe attempted to handle non-trivial things like size relationships – the prequalifier transcripts for Zoe (and also for Chip) didn’t reveal this. In the end, however, any prequalifier-like abilities paled in comparison to the ability to generate high-quality human-like canned responses – I saw it with my own eyes.

If I ever do enter the LPC again, it’ll be either to make a quick $250 bucks because I think Chip can easily score high in the prequalifier rounds (unless Bruce makes his ontology so easily exploitable that anyone can build a Loebner-prequalifying bot as easily as an AIML bot), or else to play the game like it’s intended to be played according to my modified belief system: by authoring a bajillion canned responses. (Chip’s scripting language can also handle the multiple levels that ChatScript can.) I already have a name for such a bot: (TLCBot – where TLC stands for Take Loebner’s Cash.) I find the task of authoring so many canned responses abhorrent, though, so I’m trying to think of ways to delegate this (maybe some sort of Wiki?)

Anyway, apologies if any of this comes across as disparaging. That’s really not the intention. And I do believe that Hugh’s heart and the contest organizers’ hearts are all in the right place and all of my competitors were worthy. Also, sorry for hijacking this thread, Robby – you opened the floodgates and I think that if you combine Bruce’s infrastructure with your creative writing abilities, you’ll have an unbeatable LPC bot.