Skip to main content

Reading into the common responses on Jeopardy!

·2730 words·13 mins
Jules Johnson
Author
Jules Johnson
The code for this project can be found at its github repository.

This article was spun off from a larger, more technical article on the subject of how this data was collected. I wanted to focus on some fun statistical analysis for the answers on Jeopardy!, so feel free to skip the main article if you’re only interested in the stats.

What are the most common correct responses
#

In the previous article, I created a massive dataframe containing the categories, hints, answers, dates, and point values of every question aired on Jeopardy!. The first question that I’m curious about is simple: what are the most common correct responses on Jeopardy!? What people, places, and things are most valued by the Jeopardy! writers?

Simply performing a count on the number of times a word or phrase appears as a correct response gives some surprising results:

123456789101112131415
response:AustraliaChinaJapanChicagoFranceIndiaSpainCaliforniaMexicoAlaskaCanadaIndiaHawaiiFloridaTexas
count511484472471462434427420390380376354354351340

Geographical locations are repeated far more frequently than famous people or works of art! The most frequent correct response is “What is Australia?”, appearing 511 times over the shows run. After that comes China, then Japan, Chicago, France, and so on. As a matter of fact, all of the top 37 responses are geographic locations! (the trend is broken by Napoleon, appearing 243 times.)

Geography certianly isn’t a vastly more common subject than history, so what could cause this pattern? Part of the answer is most likely that answers to geography questions tend to focus on a more narrow range of answers than other categories. There are fewer cities than people, after all. It’s also possible that locations are more common as responses, while historical figures appear more in the questions themselves. I’ve asked a few friends what they think the most popular responces would be, and they usually intuit that the top responses would be geographical.

However, there is a second answer at play here, which is that people’s names can appear a number of different ways. Let’s return to Napoleon as an example. In addition to 243 appearances of “Napoleon”, the table also contains:

Napoleon BonaparteNapoléonNapoleon (Bonaparte)Napoléon BonaparteNapoleon (I)Napoléon (Bonaparte)
47 times14 times4 times3 times3 times1 times

One might be curious why so many variations on a single name are possible. While contestants may say the full name while answering, the judges will often (but not always) allow a contestant to say only the surname, especially while refering to polititians. To eliminate ambiguity, the people maintaning the archive will often (but not always) add the given name to a response in parentheses. Ironically, Napoleon is a counter example, as his first name is much more recognizeable than his last name.

Regardless of these details, this does pose a serious problem to our understanding of this data. A first instinct at a solution might be to simply sum up the instances of each cell containing the word “Napoleon”, but this approach is fraught. Such an approach would include over twenty instances of variations on “Napoleon III”, as well as 6 instances of “Napoleon Dynamite”. This problem is even worse for less distinctive names. Here’s a breakdown of answers containing “Ford” as the last part of a person’s name, appearing three or more times:

On a more personal note, the corresponding chart for people named Johnson is even more disasterous:

This is a massive problem! How could we possibly handle this? As I see it, there’s a handful of strategies that could be used to colect different versions of one person together, some of which might be used simultaneously:

  1. Find all variations of responses that end with the same surname, and sum up their counts into a single number.
  2. Take the counts of answers that consist of a single surname, and distribute that number proportionally across the different names with that surname.
  3. Ask an LLM to consider the question as a whole, and determine the exact identity of each human included.
  4. Group together variations that differ only by a pair of parentheses or diacritical marks. for example, replace "(Gerald Ford)" with "Gerald Ford".

Option 1 can be disregarded immediately. This would require that Gerald Ford and Harrison Ford be counted as one person, which is unacceptable.

Option 2 seems like a better idea at first, but falls apart the more one considers it. It would be nice to split the 119 instances of “Ford” and add those to “Gerald Ford”, “Henry Ford”, “Betty Ford” and “Harrison Ford”. However, it’s not a totally valid assumption to make that refering to someone by one name is equally common for presidents as it is for actors. In fact, it’s not fair to assume that all 119 instances of “Ford” refer to a person at all. Surely, many refer to the Ford Motor Company.

Option 3 is tempting, but LLMs are always prone to error. This concern is easy to overstate; LLMs are getting more accurate all the time, especially as pertains to simple informational questions. However, determining the exact level of innaccuracy would take some testing and comparison that is outside the scope of this project currently.

That leaves option 4, which really does seem reasonable. This is a small change overall: Adding the 13 instances of “(Gerald) Ford” to the 128 instances of “Gerald Ford” is not likely to be hugely impactful. However, it’s also very unlikely to have negative side effects. This will be implemented moving forward.

Finally, it’s probably a good idea to restrict the timespan the program searches. What’s considered important knowledge has changed over the years, as do the people on Jeopardy!’s writing staff. Viewing just the last 5 years seems like a decent compromise between quantity and relevance of data. We can finally note the most common responses in the last few years. Here’s the top 75:

responsecount
Chicago58
Australia48
Florida43
Philadelphia40
California39
Brazil37
India37
Georgia34
Alaska34
Jupiter34
Ireland33
China33
Greece33
Mars33
Texas32
Poland31
Japan31
Spain30
Boston30
San Francisco30
Mexico29
New Orleans29
Cuba29
the Philippines29
France28
responsecount
Switzerland27
Norway27
Hawaii27
Virginia27
Venice26
Egypt26
Paris26
Iceland25
Michigan25
Portugal25
Iran25
Atlanta25
Canada25
Argentina24
the Netherlands24
Beethoven24
Germany23
Italy23
Florence23
Dublin23
Sweden23
New Zealand23
South Africa23
Amsterdam22
Denmark22
responsecount
Scotland22
Massachusetts22
mercury21
London21
Ethiopia21
Maine21
New Mexico21
Puerto Rico21
Venus21
Yellowstone20
the Mississippi20
Peru20
Morocco20
Colombia20
Antarctica20
New Jersey20
Chile20
Vienna20
Napoleon20
St. Louis19
Pennsylvania19
the Thames19
Madagascar19
Joan of Arc19
Seattle19

it’s also fairly easy to sift through this table by hand and remove all answers that are geographic locations:

answercount
Jupiter34
Mars33
Beethoven24
mercury21
Venus21
Joan of Arc19
Napoleon19
Hamlet19
Saturn19
Picasso18
iron18
Macbeth18
Mercury18
David18
Tesla18
Galileo17
Jordan17
Richard III17
Neptune17
Wilson17
Julius Caesar17
Cleopatra16
the Moon16
Cinderella16
Hamilton16
responsecount
lead16
carbon dioxide15
Mozart15
Moses15
hydrogen15
Churchill15
Lady Gaga15
tea15
soccer15
Solomon15
baseball15
Lincoln15
Harvard15
the liver15
smallpox14
John Quincy Adams14
Henry VIII14
a horse14
Copernicus14
Catherine the Great14
Buddhism14
the Amazon15
Alexander the Great14
World War I14
Exodus14
responsecount
Teddy Roosevelt14
Eisenhower14
the heart14
gold14
Wagner14
1213
Robinson Crusoe13
Marie Antoinette13
the Titanic13
cricket13
Grey’s Anatomy13
The Phantom of the Opera13
Jaws13
John Quincy Adams13
Twelfth Night13
Nero13
Carmen13
bamboo13
King Lear13
the Statue of Liberty13
Marshall13
John13
Nixon13
Madonna13
teeth13
the Louvre13

After this point, this article is a work in progres
#

I’m also currious if these results change for different point values. Chicago is a very well known location for Americans, so it’s possible that Chicago only appears so often because it’s an easy “gimme” question. In general, clues with higher point values are much harder; maybe common responses are only common for easy questions. Would Chicago still be the most common response if we limit our search to expensive questions? Let’s break down answers by difficulty, and see if the results change. In order to account for splitting the data into ten parts, I’ll extend the search up to 25 years. Here’s the 20 most common questions for each round of Single Jeopardy:

$200 answercount$400 answercount$600 answercount$800 answercount$1000 answercount
China108Australia64California57Chicago41Australia39
Hawaii106Alaska62Chicago54France36Maine35
Japan104Chicago58Australia45New York36Brazil35
California74California54China44California34Chicago34
Alaska73France54Texas44Australia34Greece32
Chicago72China53Spain42Spain34South Africa31
Australia67Japan53India39China33France29
Mexico66Spain53France38India33Sweden29
Florida65Canada52Florida38Alaska32Spain26
France64Mexico51Japan37Greece32Japan26
India59India50London35Minnesota32Oklahoma25
George Washington58Florida49Hawaii35Pennsylvania31Belgium25
Ireland58Boston44Germany35Mexico30Utah25
Boston56New York44Sweden35Maine30Texas24
Canada55Texas44New Orleans34Canada29Ireland24
Russia52Egypt40Italy34New Mexico29Wyoming24
Egypt50San Francisco38Alaska33Texas28Maryland24
New Orleans50London37Mars32Italy28Norway24
Paris50Switzerland37Greece31Israel28Portugal24
New York49Hawaii36South Africa31Montana28Thailand24

And for double Jeopardy:

$400 answercount$800 answercount$1200 answercount$1600 answercount$2000 answercount
China111Australia57Japan46Australia48Brazil33
Japan87Chicago57Australia45Sweden40Denmark33
France85India55Sweden45Georgia37Portugal32
Australia81Spain54France42Italy36India30
Paris77France47Spain40Brazil34Andrew Jackson30
California74China46Canada40Florida34Sweden28
Mexico73Mexico46India39France32Indonesia28
Cleopatra70Japan43Italy37Spain32Georgia27
Spain68Paris43Portugal36South Africa32Norway26
London67Egypt43Chicago35Maine32Poland26
Alaska67California42Denmark35India31the Netherlands26
Ireland67Ireland41Greece34Mexico31Spain25
Italy66Italy41China33Switzerland31North Carolina25
India64South Africa41Brazil33Norway30Finland25
Chicago62Canada39Paris32Portugal29South Africa24
Canada57Venus39South Africa32Chicago29Chicago24
George Washington56Napoleon37Texas32Denmark29New Hampshire24
Hawaii55Rome37Germany31Andrew Jackson28France23
Florida52Hamlet37New York31China27Egypt22
Egypt52Texas36Napoleon30Greece26the Philippines22

Here’s the same data, with all the geographic locations removed:

$200 answercount$400 answercount$600 answercount$800 answercount$1000 answercount
George Washington58Ronald Reagan36Mars32421Andrew Jackson17
red49230325Eisenhower21Grover Cleveland17
Abraham Lincoln47red29white24Mars18golf16
McDonald’s47gold29basketball23Venus18Theodore Roosevelt16
Napoleon46Wisconsin28Ronald Reagan22golf18415
gold43George Washington27Venus22Eleanor Roosevelt18Calvin Coolidge15
Julius Caesar42tea27Richard Nixon22Theodore Roosevelt18white15
Lincoln41326Napoleon21Andrew Jackson171215
Madonna41Maine26Thomas Jefferson20blue17Henry VIII14
Elvis Presley39Sweden25baseball20Jacob17Julius Caesar14
water38Mars24Andrew Jackson20316iron14
milk36Thomas Jefferson24George Washington19Richard Nixon16Uranus14
Cleopatra36coffee24Abraham Lincoln19Mark Twain16Solomon13
Babe Ruth35rice24Julius Caesar19Henry VIII16Jupiter13
white34World War I24418Solomon15Saturn13
Moses34Elvis Presley23green17Pocahontas15Neptune13
234Venus23Jupiter17nitrogen15Woodrow Wilson13
Coca-Cola33Pennsylvania23blue17basketball14Job12
golf32New Jersey23Hamlet17Jupiter14Othello12
Richard Nixon32oil23Buddhism17714John Adams12
$400 answercount$800 answercount$1200 answercount$1600 answercount$2000 answercount
Cleopatra70Venus39Napoleon30Andrew Jackson28Andrew Jackson30
George Washington56Napoleon37Mozart30Woodrow Wilson22Woodrow Wilson21
Napoleon51Hamlet37Thomas Jefferson28Henry VIII21Eugene O’Neill20
Julius Caesar49Macbeth34David26Jupiter20William Faulkner19
Michelangelo47Julius Caesar32Galileo25Thomas Jefferson19Virginia Woolf18
Mars43Abraham Lincoln31Michelangelo24Theodore Roosevelt19Henry Moore18
Joan of Arc43Picasso31Hamlet23Richard III18Richard III17
Mark Twain38Thomas Jefferson30Mars22Eleanor Roosevelt18John Quincy Adams17
Abraham Lincoln38Cleopatra28Beethoven22King Lear17Maria Theresa17
Hamlet36Ronald Reagan28Theodore Roosevelt22Charlemagne17Claudius16
Alexander the Great36Mars27King Lear21A Midsummer Night’s Dream17Aeschylus16
Ronald Reagan35Mozart27Richard III20Rembrandt17Twelfth Night15
Romeo and Juliet35Lincoln26Henry VIII20George Eliot17Orpheus15
Columbus35Queen Victoria26Picasso19Herbert Hoover17John Adams15
Venus34George Washington25319Archimedes17Raphael15
Agatha Christie34World War I25Sylvia Plath19Galileo16Nathaniel Hawthorne15
gold33David25Gerald Ford18Georgia O’Keeffe16Aristophanes15
Shakespeare33Michelangelo24John Adams18English16715
water33Benjamin Franklin24Thomas Hardy18Venus15Sir Walter Scott14
Beethoven32Galileo24Venus17Dylan Thomas15Zachary Taylor14

Reading through this data is endlessly fascinating to me. Of course, I care more than the average bear about this show, so it’s hard for me to tell what pecularities of this show are interesting to the average person. Here’s just a few of the odd patterns present when breaking this data down by value:

  • Although the Beatles are notorious as a common subject on Jeopardy!, their prevalence as a response drops off rapidly after the $200 clue
  • The same is true for Shakespeare, although the titles of his plays see reasonable representation across different clue values
  • In general, some answers seem to favor certain values. "Barcelona" is three times as likely to be the answer to the $800 clue than any other value in Single Jeopardy
  • Aeschylus is an even more expreme example. He has appeared as an answer only three times in Single Jeopardy, twice in a $1600 clue, and sixteen times under the $2000 clue.

It’s worth noting that none of these numbers are incredibly large. I don’t want to overstate the signifigance of Aeschylus’ 16 appearances, especially since there’s been about 36,000 questions worth $2000 in the past 25 years. Nonetheless, we’ve at least gained some insight into our question! It does seem that espensive questions tend to be more spread out than cheap questions.