This section was originally written in May 2008. For posterity, I’ll add updates to this page every now and then rather than reworking it completely.
In one sentence…
A chatbot shouldn’t be considered “human” if it is incapable of answering questions that any child can answer.
In many sentences…
You are witness to a historical moment in the history of Artificial Intelligence. It is generally accepted that commonsense knowledge and reasoning are the underpinnings of intelligent discourse, and that machines will need some of this common sense if they are ever to appear intelligent, yet there is a massive divide between expectations and reality when it comes to commonsense knowledge and representation.
Here is a case in point: Which is larger: an orange or the moon?
Did you know that to my knowledge, at the time I wrote this (28 May 2008), there is no single computer in existence (except for Chip Vivant) that can answer this question?
Note that I didn’t say “no other chatbot in existence”, I said no computer. (I know, I should really say software program).
Now I’m not saying that Chip is smarter than other programs. This is just a representative question. But as of today, I can’t find a single place anywhere where I can get an answer to this question. Not Google (or any other search engine), not MIT’s START Natural Language Question Answering System.
As of this writing, even OpenCyc (v1.02), which proclaims itself “an upper ontology whose domain is all of human consensus reality”, “containing hundreds of thousands of terms, along with millions of assertions relating the terms to each other” has absolutely no clue whether an orange is bigger than the moon!
Needless to say, none of my competitors (ALICE, Jabberwacky, Ultra HAL, Jeeney, etc.) know this either (though that may likely change after this web page becomes public.)
It’s astounding!
Update (7 July 2011): Not much has changed in the 3+ years since I wrote this. (It’s astounding.) I believe that Bruce Wilcox’s Rosette might be able to answer this question now. Not sure about any other entries in the 2011 Loebner Prize Competition
The Loebner Competition
For an account of my experiences in entering the Loebner Prize Competitions of 2008-2011. follow this link.
When you talk to a chatbot, the above glaring knowledge deficiencies aren’t immediately apparent. In fact, you can carry on an engaging or pleasant conversation for quite some time and be blissfully ignorant of the fact that these programs have no idea whether an orange is bigger or smaller than the moon. Sometimes, you might even become attached to such programs, confide in them, or in some cases, even be
duped by them into giving up your credit card information, etc. How can this be?
There’s actually a name for this phenomenon. It’s called The ELIZA Effect. According to that Wikipedia entry, the ELIZA effect demonstrates “the principle of using social engineering rather than explicit programming to pass a Turing test”.
That’s why I was a bit depressed when I started coding up my entry for the Loebner competition. Most chatbots use keyword and pattern recognition techniques to analyze your input sentence and then serve up some pithy or humorous answer. I’m not placing a value judgment on this approach and I definitely think these techniques have their place in entertainment or commercial applications like online help desks, etc., but this approach definitely is not my main interest. Yet I would very much like to win a Loebner competition. Rather than wasting my time trying to fool a judge by clever, canned responses, however, I’d rather educate the judges and alert them to the fact that with simple pattern-and-response-based chatbots, the Emperor Has No Clothes: there are a slew of fundamental questions that any child can answer that these bots cannot. What’s more, as Hugh Loebner himself pointed out on the Robitron Yahoo Group (and which I wholeheartedly agree with):
The strategy of having millions of factoids is sterile. Consider a simple question: ‘Which is larger, a grape or an automobile?’ It is highly unlikely that anyone would ever enter the factoid that ‘an automobile is larger than a grape,’ yet any human would know the answer.
It is these sort of questions which pique my interest. Rather than coding up a bot with a truckload of canned responses like who its favorite Democratic candidate was in the upcoming U.S. elections, I wanted a bot that would know that an orange is smaller than the moon. I am not categorically against canned responses and using them to enliven the conversation, but for me personally, each canned response is a lie, a failed opportunity to have the bot truly understand the question and formulate its own answer to it.
(By the way, here it is for all of the Internet to see: The moon is bigger than an orange!)
Chip Vivant
Overview
My competitors might have cuter canned responses over a wide variety of subjects, but they will fail to answer basic questions that any child can answer. If this competition is about appearing human, I hope that you’ll ask the right questions, expect genuine answers and not be fooled by evasive answers or attempts to change the subject.
Here are some representative questions that I wager that my peers won’t be able to cope with. I believe that questions such as these should be the litmus test of “appearing human”. As time goes on, I will make Chip smarter and able to cope with a wider variety of questions. (A Loebner win would also give me additional motivation to do this! 🙂 )
Keep in mind that Chip is only a few months old whereas some of my competitors have been around for years!
General Knowledge
Which is larger: an orange or the moon? What color is a strawberry? Which is faster: a train or a plane? Which is slower: a bus, a snail, a bicycle or a plane? Which is softer: a whisper or a shout? Does a violist have lungs? What is a baby dog called? What is a group of cows called? What color is a strawberry? What do a strawberry and a raspberry have in common? How does a lemon taste? Is a cat a sort of mammal? Can you drink a window? Can you eat a pizza? Can you carry a book? Can you hear a song? Can you touch a marble? Can a marble be carried? Can one throw a book? Is a shirt something that is worn? Is it possible to eat a shirt?
Deductive Reasoning
John is older than Mary, and Mary is older than Sarah. Which of them is the oldest? Who is the youngest?
(“spellcheck off” temporary disables spellchecking to allow the unknown word “zombat”.)
spellcheck off A zombat is a sort of strawberry. Can you eat a zombat? What color is a zombat? spellcheck on
No-Template Input, Memory, Reasoning
I have a friend named Harry who likes to play tennis. What is the name of the friend I just told you about? Harry also likes to play the piano. What game does he play? What instrument does he play? Franklin likes to play baseball and the flute. What does Franklin play? Who plays an instrument? Who plays a sport? Franklin went to the beach and to the store. Where did Franklin go? My friend Harry likes to play tennis. Who likes to play tennis? Who is my friend? What is my friend's name? My friend Joe likes to kick cans. What is my friend? Who is the friend that plays tennis? Who is my friend that kicks cans? What is the name of the friend that plays tennis? What does my friend like to play? What does my friend like to kick? My friend who likes to play tennis eats peaches. My friend who likes to kick cans eats apples. Who eats apples? What does Harry eat? What does Joe eat?
Loebner 2007 Screening Questions
Chip can answer all of the Loebner 2007 screening questions (except for “Which round is it?”). None of Chip’s peers can do this, to my knowledge:
What is a hammer? What time is it? Is it morning, noon, or night? What would I use a hammer for? Of what use is a taxi? Which is larger, a grape or a grapefruit? Which is faster, a train or a plane? John is older than Mary, and Mary is older than Sarah. Which of them is the oldest? I have a friend named Harry who likes to play tennis. What is the name of the friend I just told you about? What game does Harry like to play?
Other Cool Stuff
The following items are just for fun and not intended as an indicator of humanness.
What is the square root of i? What is the arccosine of (the square root of two divided by two) in degrees? Where is Bangalore? What is Reading, Berkshire? What is the population of Reading, Berkshire? How far is Reading, Berkshire from Milwaukee, Wisconsin? When was George Washington born? How old is George Washington? How old would George Washington be if he were still alive?
Chip-Specific Commands
The following commands help with managing your session and feedback.
clear temp facts Clears any temporary facts (your name, your friend Harry) commit temp facts Commits any temporary facts so that they can't be subsequently cleared spellcheck on Turns on spellchecking (default setting) spellcheck off Turns off spellchecking spellcheck Reports whether spellchecking is currently on or off.
Known Issues
Please be indulgent with Chip! After all, he’s just a few months old (update: over three years old now!, but developed only sporadically since the ’08 LPC) and attempts to do things most other bots don’t do. Please report any feedback using !Feedback <your feedback> (note the initial exclamation point). There are probably a ton of deficiencies and issues; here are some known issues that immediately come to mind:
- Chip is very picky about spelling, grammar and capitalization. That’s because he actually tries to understand what you are saying. Make sure you capitalize proper nouns like names, etc.
- With Chip’s no-template parser, you have to say “play the <instrument>” vs. “play <sport>”, i.e. “Eileen plays the flute.” vs. “Eileen plays tennis.”
- The no-template stuff can’t handle negation yet. (“Jim is not a strawberry.”)
Thought you might be interested to know, Mitsuku can answer your ‘Which is larger: an orange or the moon?’ question: http://i.imgur.com/kIj9u.png
Hi Darin – Bruce Wilcox’s bots (Rosette, Angela) can answer these too now. We’re slowly inching forward 🙂
Hi, wondered if you were entering in 2013. I want to produce something real and not canned.
Dan
Hi Dan. Not sure yet. I’m kind of still warm and glowing from the last win and don’t want to wreck the feeling, especially since Hugh’s not doing the prescreening round. I say that every year, though, and usually end up entering, though no guarantees.
P.S. I don’t think it’s possible to produce something that’s 100% non-canned and still win. Chip has plenty of canned responses, though I wanted something extra too.
Hi Mohan, I enjoyed your presentation at the Chatbots 3.3 conference and wondered how do you get the ideas of what to include in your bot, as I see it is unavailable for people to talk to. With mine, I pretty much depend on user input and constantly refine it from what people say to it. I would struggle to think of new things to add by myself.
Hi Steve – nice to hear from you. Chip was online in 2008 for the Reading prescreening, but like Robby Garner, I found that when looking at the logs, it was mostly uninteresting stuff and a waste of my time to try to wade through all of it. Not that it was completely useless (and my failure to give Chip more exposure to the public did cost me dearly in 2008 and 2009), but rather than the effort wasn’t worth it compared to other things I could spend my time on.
I’m not comparing myself to Steve Jobs by any stretch of the imagination, but one takeaway I got from Apple is that they don’t waste time on focus groups and rather, give people what they don’t know they want yet, but what they really need. I consider myself a relatively empathetic person and instinctively try to engage people, learn about them, and make them feel at ease, so then I think about what I say to people, how to model this, and so on.
I got lucky in 2012 because judges were receptive to this approach, but I also got slaughtered in 2011 because judges were irritated that Chip was asking questions instead of letting them ask them. (That said, Chip was asking questions to Brits like “How old are you?” and “Do you have children?” which would be acceptable to Americans, but in hindsight turned out to be off-putting to Brits and this is something that would have come up if I had “exposed” Chip to the public.) So it really depends on the roll of the dice too.
My money is on Mitsuku this year for the LPC. I’ve seen the dramatic improvements in your bot over the years and think this might be your moment.
Hi congratulations for the prize.
I have a question for a student job.
I would like to know the technique you used on chip (example numerical methods or something).
plis
thanks in advance..
Hi Miguel, Thanks for your message. Chip passes its input to a number of what I call Goals, which each vie with each other to provide a response. In the end, the goal’s response is chosen based on the goal’s self-score of how well it responded to the input. There’s also some recursion involved.
For the mathematics, I took this project:
http://java.net/projects/eval/pages/Home
…and hacked it to use complex numbers. I had to make a number of hacks to the Commons Math classes to do this, but this was over five years ago and can’t remember exactly what I did. I needed to do a number of lexical substitutions too.
Thank you very much helped me in my schoolwork 🙂
You’re welcome!
Hello Mohan sir,
First of all congratulations for last year’s precious prize. I have a faith in you that you’ll change the way of communication through AI Bots.
I am a Computer Science Engineering student and just entered into the vast world of AI. I am also one of those aspirants who want to achieve something like you did. I have just started looking for the resources regarding AI & related stuff and don’t have enough knowledge to even discuss with you something about this topic. So, I would be honoured if you guide me a little bit about where to start from, & what to work?
I have imagined that one day i’ll also be standing in a row of AI Experts of the world.
Waiting for your reply.
Thanking You.
Best Regards for your future.
Hi Abhishek – thanks for the kind words. My advice to you is very simple.
1) Ensure that you have the necessary programming skills. It seems like you’re already doing this.
2) Quite humbly, I think your vision is wrong. You shouldn’t be thinking of standing in a row of AI experts in the world. This East Indian mentality is very self-limiting, because it pins your well-being to the outcome of something which may or may not actually occur. Much better is to take the time to determine where your passion lies: what problem do you actually want to try to solve? What problem would you be excited about working on *regardless* of whether you eventually achieve fame and recognition for it.
You then start working on that problem. It doesn’t matter whether you actually have the skills for that or not. If you identify the thing that you’re passionate about solving, that will be the driver which will then motivate you to acquire any deficient skills so you can be the best person you can to solve the problem you’re passionate about solving.
That feeling is much deeper and more rewarding than standing in next to a bunch of famous people 🙂