spilling their lives. Entire industries grew by understanding what people were saying and predicting what they might want to do, where they might want to go, and what they were eager to buy. Google was already mining and indexing words on the Web, using them to build a media and advertising empire. Only months earlier, Google had debuted as a publicly traded company, and the new stock was sky-rocketing.
IBM wasnât about to mix it up with Google in the commercial Web. But Big Blue needed state-of-the-art tools to provide its corporate customers with the fastest and most insightful read of the words cascading through their networks. To keep a grip on its gold-plated consulting business, IBM required the very smartest, language-savvy technologyâand it needed its customers to know and trust that it had it. It was central to IBMâs brand.
So in mid-2005 Horn took up the challenge with a number of his top researchers, including Ferrucci. A twelve-year veteran at the company, Ferrucci managed a handful of research teams, including the five people who were teaching machines to answer simple questions in English. Their discipline was called question-answering. Ferrucci knew the challenges all too well. The machines stumbled in understanding English and appeared to plateau, in competitions sponsored by the U.S. government, at a success rate of about 35 percent.
Ferrucci wasnât a big
Jeopardy
fan, but he was familiar with it enough to appreciate the obstacles involved.
Jeopardy
tested a combination of knowledge, speed, and accuracy, along with game strategy. The show featured three contestants, each with a buzzer. In the course of about twenty minutes, they raced to respond to sixty clues representing a combined value of $54,000. Each oneâand this was a
Jeopardy
quirkâwas in fact an answer, some far more complex than others. The contestant had to provide the missing question. For example, in an unusual Tournament of Champions game that aired in November 1994, contestants were presented with this $500 clue 1 under the category Furniture: âFrench term for a what-not, a stand of tiered shelves with slender supports used to display curios.â The host, Alex Trebek, read the clue from the big game board. The moment he finished, a panel around the question lit up setting off the race to buzz. On average, contestants had about four seconds to read and consider the clue before buzzing. The first to buzz was, in effect, placing a bet. The right responseââWhat is an étagère?ââwas worth $500 and gave the contestant the chance to pick again. (âLetâs try European Capitals for $200.â) A botched response wiped the same amount from a contestantâs score and gave the other two a chance to try. (In this example, no one dared to buzz. Such a clue, uncommon in
Jeopardy,
is known as a âtriple-stumper.â)
To compete in
Jeopardy,
a machine not only would need to come up with the answer, posed as a question, within four seconds, but it would also have to gauge its confidence in its response. It would have to know what it knew. âHumans know what they know like
that,
â Ferrucci said later, snapping his fingers. Replicating such confidence in a computer would be tricky. Whatâs more, the computer would have to calculate the risk according to where it stood in the game. If it was far ahead and had only middling confidence on âétagère,â it might make more sense not to buzz. In addition to piling up knowledge, a computer would have to learn to play the game.
Complicating the game strategy were four wild cards. Three of the gameâs sixty hidden clues were so-called Daily Doubles. In that 1994 game, a contestant named Rachael Schwartz, an attorney from Bedminster, New Jersey, asked for the $400 clue in the Furniture category. Up popped a Daily Double giving her the chance to bet some or all of her money on a furniture-related clue she had