This article was spun off from a larger, more technical article on the subject of how this data was collected. I wanted to focus on some fun statistical analysis for the answers on Jeopardy!, so feel free to skip the main article if you’re only interested in the stats.
What are the most common correct responses#
In the previous article, I created a massive dataframe containing the categories, hints, answers, dates, and point values of every question aired on Jeopardy!. The first question that I’m curious about is simple: what are the most common correct responses on Jeopardy!? What people, places, and things are most valued by the Jeopardy! writers?
Simply performing a count on the number of times a word or phrase appears as a correct response gives some surprising results:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
response: | Australia | China | Japan | Chicago | France | India | Spain | California | Mexico | Alaska | Canada | India | Hawaii | Florida | Texas |
count | 511 | 484 | 472 | 471 | 462 | 434 | 427 | 420 | 390 | 380 | 376 | 354 | 354 | 351 | 340 |
Geographical locations are repeated far more frequently than famous people or works of art! The most frequent correct response is “What is Australia?”, appearing 511 times over the shows run. After that comes China, then Japan, Chicago, France, and so on. As a matter of fact, all of the top 37 responses are geographic locations! (the trend is broken by Napoleon, appearing 243 times.)
Geography certianly isn’t a vastly more common subject than history, so what could cause this pattern? Part of the answer is most likely that answers to geography questions tend to focus on a more narrow range of answers than other categories. There are fewer cities than people, after all. It’s also possible that locations are more common as responses, while historical figures appear more in the questions themselves. I’ve asked a few friends what they think the most popular responces would be, and they usually intuit that the top responses would be geographical.
However, there is a second answer at play here, which is that people’s names can appear a number of different ways. Let’s return to Napoleon as an example. In addition to 243 appearances of “Napoleon”, the table also contains:
Napoleon Bonaparte | Napoléon | Napoleon (Bonaparte) | Napoléon Bonaparte | Napoleon (I) | Napoléon (Bonaparte) |
---|---|---|---|---|---|
47 times | 14 times | 4 times | 3 times | 3 times | 1 times |
One might be curious why so many variations on a single name are possible. While contestants may say the full name while answering, the judges will often (but not always) allow a contestant to say only the surname, especially while refering to polititians. To eliminate ambiguity, the people maintaning the archive will often (but not always) add the given name to a response in parentheses. Ironically, Napoleon is a counter example, as his first name is much more recognizeable than his last name.
Regardless of these details, this does pose a serious problem to our understanding of this data. A first instinct at a solution might be to simply sum up the instances of each cell containing the word “Napoleon”, but this approach is fraught. Such an approach would include over twenty instances of variations on “Napoleon III”, as well as 6 instances of “Napoleon Dynamite”. This problem is even worse for less distinctive names. Here’s a breakdown of answers containing “Ford” as the last part of a person’s name, appearing three or more times:
On a more personal note, the corresponding chart for people named Johnson is even more disasterous:
This is a massive problem! How could we possibly handle this? As I see it, there’s a handful of strategies that could be used to colect different versions of one person together, some of which might be used simultaneously:
- Find all variations of responses that end with the same surname, and sum up their counts into a single number.
- Take the counts of answers that consist of a single surname, and distribute that number proportionally across the different names with that surname.
- Ask an LLM to consider the question as a whole, and determine the exact identity of each human included.
- Group together variations that differ only by a pair of parentheses or diacritical marks. for example, replace "(Gerald Ford)" with "Gerald Ford".
Option 1 can be disregarded immediately. This would require that Gerald Ford and Harrison Ford be counted as one person, which is unacceptable.
Option 2 seems like a better idea at first, but falls apart the more one considers it. It would be nice to split the 119 instances of “Ford” and add those to “Gerald Ford”, “Henry Ford”, “Betty Ford” and “Harrison Ford”. However, it’s not a totally valid assumption to make that refering to someone by one name is equally common for presidents as it is for actors. In fact, it’s not fair to assume that all 119 instances of “Ford” refer to a person at all. Surely, many refer to the Ford Motor Company.
Option 3 is tempting, but LLMs are always prone to error. This concern is easy to overstate; LLMs are getting more accurate all the time, especially as pertains to simple informational questions. However, determining the exact level of innaccuracy would take some testing and comparison that is outside the scope of this project currently.
That leaves option 4, which really does seem reasonable. This is a small change overall: Adding the 13 instances of “(Gerald) Ford” to the 128 instances of “Gerald Ford” is not likely to be hugely impactful. However, it’s also very unlikely to have negative side effects. This will be implemented moving forward.
Finally, it’s probably a good idea to restrict the timespan the program searches. What’s considered important knowledge has changed over the years, as do the people on Jeopardy!’s writing staff. Viewing just the last 5 years seems like a decent compromise between quantity and relevance of data. We can finally note the most common responses in the last few years. Here’s the top 75:
|
|
|
it’s also fairly easy to sift through this table by hand and remove all answers that are geographic locations:
|
|
|
After this point, this article is a work in progres#
I’m also currious if these results change for different point values. Chicago is a very well known location for Americans, so it’s possible that Chicago only appears so often because it’s an easy “gimme” question. In general, clues with higher point values are much harder; maybe common responses are only common for easy questions. Would Chicago still be the most common response if we limit our search to expensive questions? Let’s break down answers by difficulty, and see if the results change. In order to account for splitting the data into ten parts, I’ll extend the search up to 25 years. Here’s the 20 most common questions for each round of Single Jeopardy:
$200 answer | count | $400 answer | count | $600 answer | count | $800 answer | count | $1000 answer | count |
---|---|---|---|---|---|---|---|---|---|
China | 108 | Australia | 64 | California | 57 | Chicago | 41 | Australia | 39 |
Hawaii | 106 | Alaska | 62 | Chicago | 54 | France | 36 | Maine | 35 |
Japan | 104 | Chicago | 58 | Australia | 45 | New York | 36 | Brazil | 35 |
California | 74 | California | 54 | China | 44 | California | 34 | Chicago | 34 |
Alaska | 73 | France | 54 | Texas | 44 | Australia | 34 | Greece | 32 |
Chicago | 72 | China | 53 | Spain | 42 | Spain | 34 | South Africa | 31 |
Australia | 67 | Japan | 53 | India | 39 | China | 33 | France | 29 |
Mexico | 66 | Spain | 53 | France | 38 | India | 33 | Sweden | 29 |
Florida | 65 | Canada | 52 | Florida | 38 | Alaska | 32 | Spain | 26 |
France | 64 | Mexico | 51 | Japan | 37 | Greece | 32 | Japan | 26 |
India | 59 | India | 50 | London | 35 | Minnesota | 32 | Oklahoma | 25 |
George Washington | 58 | Florida | 49 | Hawaii | 35 | Pennsylvania | 31 | Belgium | 25 |
Ireland | 58 | Boston | 44 | Germany | 35 | Mexico | 30 | Utah | 25 |
Boston | 56 | New York | 44 | Sweden | 35 | Maine | 30 | Texas | 24 |
Canada | 55 | Texas | 44 | New Orleans | 34 | Canada | 29 | Ireland | 24 |
Russia | 52 | Egypt | 40 | Italy | 34 | New Mexico | 29 | Wyoming | 24 |
Egypt | 50 | San Francisco | 38 | Alaska | 33 | Texas | 28 | Maryland | 24 |
New Orleans | 50 | London | 37 | Mars | 32 | Italy | 28 | Norway | 24 |
Paris | 50 | Switzerland | 37 | Greece | 31 | Israel | 28 | Portugal | 24 |
New York | 49 | Hawaii | 36 | South Africa | 31 | Montana | 28 | Thailand | 24 |
And for double Jeopardy:
$400 answer | count | $800 answer | count | $1200 answer | count | $1600 answer | count | $2000 answer | count |
---|---|---|---|---|---|---|---|---|---|
China | 111 | Australia | 57 | Japan | 46 | Australia | 48 | Brazil | 33 |
Japan | 87 | Chicago | 57 | Australia | 45 | Sweden | 40 | Denmark | 33 |
France | 85 | India | 55 | Sweden | 45 | Georgia | 37 | Portugal | 32 |
Australia | 81 | Spain | 54 | France | 42 | Italy | 36 | India | 30 |
Paris | 77 | France | 47 | Spain | 40 | Brazil | 34 | Andrew Jackson | 30 |
California | 74 | China | 46 | Canada | 40 | Florida | 34 | Sweden | 28 |
Mexico | 73 | Mexico | 46 | India | 39 | France | 32 | Indonesia | 28 |
Cleopatra | 70 | Japan | 43 | Italy | 37 | Spain | 32 | Georgia | 27 |
Spain | 68 | Paris | 43 | Portugal | 36 | South Africa | 32 | Norway | 26 |
London | 67 | Egypt | 43 | Chicago | 35 | Maine | 32 | Poland | 26 |
Alaska | 67 | California | 42 | Denmark | 35 | India | 31 | the Netherlands | 26 |
Ireland | 67 | Ireland | 41 | Greece | 34 | Mexico | 31 | Spain | 25 |
Italy | 66 | Italy | 41 | China | 33 | Switzerland | 31 | North Carolina | 25 |
India | 64 | South Africa | 41 | Brazil | 33 | Norway | 30 | Finland | 25 |
Chicago | 62 | Canada | 39 | Paris | 32 | Portugal | 29 | South Africa | 24 |
Canada | 57 | Venus | 39 | South Africa | 32 | Chicago | 29 | Chicago | 24 |
George Washington | 56 | Napoleon | 37 | Texas | 32 | Denmark | 29 | New Hampshire | 24 |
Hawaii | 55 | Rome | 37 | Germany | 31 | Andrew Jackson | 28 | France | 23 |
Florida | 52 | Hamlet | 37 | New York | 31 | China | 27 | Egypt | 22 |
Egypt | 52 | Texas | 36 | Napoleon | 30 | Greece | 26 | the Philippines | 22 |
Here’s the same data, with all the geographic locations removed:
$200 answer | count | $400 answer | count | $600 answer | count | $800 answer | count | $1000 answer | count |
---|---|---|---|---|---|---|---|---|---|
George Washington | 58 | Ronald Reagan | 36 | Mars | 32 | 4 | 21 | Andrew Jackson | 17 |
red | 49 | 2 | 30 | 3 | 25 | Eisenhower | 21 | Grover Cleveland | 17 |
Abraham Lincoln | 47 | red | 29 | white | 24 | Mars | 18 | golf | 16 |
McDonald’s | 47 | gold | 29 | basketball | 23 | Venus | 18 | Theodore Roosevelt | 16 |
Napoleon | 46 | Wisconsin | 28 | Ronald Reagan | 22 | golf | 18 | 4 | 15 |
gold | 43 | George Washington | 27 | Venus | 22 | Eleanor Roosevelt | 18 | Calvin Coolidge | 15 |
Julius Caesar | 42 | tea | 27 | Richard Nixon | 22 | Theodore Roosevelt | 18 | white | 15 |
Lincoln | 41 | 3 | 26 | Napoleon | 21 | Andrew Jackson | 17 | 12 | 15 |
Madonna | 41 | Maine | 26 | Thomas Jefferson | 20 | blue | 17 | Henry VIII | 14 |
Elvis Presley | 39 | Sweden | 25 | baseball | 20 | Jacob | 17 | Julius Caesar | 14 |
water | 38 | Mars | 24 | Andrew Jackson | 20 | 3 | 16 | iron | 14 |
milk | 36 | Thomas Jefferson | 24 | George Washington | 19 | Richard Nixon | 16 | Uranus | 14 |
Cleopatra | 36 | coffee | 24 | Abraham Lincoln | 19 | Mark Twain | 16 | Solomon | 13 |
Babe Ruth | 35 | rice | 24 | Julius Caesar | 19 | Henry VIII | 16 | Jupiter | 13 |
white | 34 | World War I | 24 | 4 | 18 | Solomon | 15 | Saturn | 13 |
Moses | 34 | Elvis Presley | 23 | green | 17 | Pocahontas | 15 | Neptune | 13 |
2 | 34 | Venus | 23 | Jupiter | 17 | nitrogen | 15 | Woodrow Wilson | 13 |
Coca-Cola | 33 | Pennsylvania | 23 | blue | 17 | basketball | 14 | Job | 12 |
golf | 32 | New Jersey | 23 | Hamlet | 17 | Jupiter | 14 | Othello | 12 |
Richard Nixon | 32 | oil | 23 | Buddhism | 17 | 7 | 14 | John Adams | 12 |
$400 answer | count | $800 answer | count | $1200 answer | count | $1600 answer | count | $2000 answer | count |
---|---|---|---|---|---|---|---|---|---|
Cleopatra | 70 | Venus | 39 | Napoleon | 30 | Andrew Jackson | 28 | Andrew Jackson | 30 |
George Washington | 56 | Napoleon | 37 | Mozart | 30 | Woodrow Wilson | 22 | Woodrow Wilson | 21 |
Napoleon | 51 | Hamlet | 37 | Thomas Jefferson | 28 | Henry VIII | 21 | Eugene O’Neill | 20 |
Julius Caesar | 49 | Macbeth | 34 | David | 26 | Jupiter | 20 | William Faulkner | 19 |
Michelangelo | 47 | Julius Caesar | 32 | Galileo | 25 | Thomas Jefferson | 19 | Virginia Woolf | 18 |
Mars | 43 | Abraham Lincoln | 31 | Michelangelo | 24 | Theodore Roosevelt | 19 | Henry Moore | 18 |
Joan of Arc | 43 | Picasso | 31 | Hamlet | 23 | Richard III | 18 | Richard III | 17 |
Mark Twain | 38 | Thomas Jefferson | 30 | Mars | 22 | Eleanor Roosevelt | 18 | John Quincy Adams | 17 |
Abraham Lincoln | 38 | Cleopatra | 28 | Beethoven | 22 | King Lear | 17 | Maria Theresa | 17 |
Hamlet | 36 | Ronald Reagan | 28 | Theodore Roosevelt | 22 | Charlemagne | 17 | Claudius | 16 |
Alexander the Great | 36 | Mars | 27 | King Lear | 21 | A Midsummer Night’s Dream | 17 | Aeschylus | 16 |
Ronald Reagan | 35 | Mozart | 27 | Richard III | 20 | Rembrandt | 17 | Twelfth Night | 15 |
Romeo and Juliet | 35 | Lincoln | 26 | Henry VIII | 20 | George Eliot | 17 | Orpheus | 15 |
Columbus | 35 | Queen Victoria | 26 | Picasso | 19 | Herbert Hoover | 17 | John Adams | 15 |
Venus | 34 | George Washington | 25 | 3 | 19 | Archimedes | 17 | Raphael | 15 |
Agatha Christie | 34 | World War I | 25 | Sylvia Plath | 19 | Galileo | 16 | Nathaniel Hawthorne | 15 |
gold | 33 | David | 25 | Gerald Ford | 18 | Georgia O’Keeffe | 16 | Aristophanes | 15 |
Shakespeare | 33 | Michelangelo | 24 | John Adams | 18 | English | 16 | 7 | 15 |
water | 33 | Benjamin Franklin | 24 | Thomas Hardy | 18 | Venus | 15 | Sir Walter Scott | 14 |
Beethoven | 32 | Galileo | 24 | Venus | 17 | Dylan Thomas | 15 | Zachary Taylor | 14 |
Reading through this data is endlessly fascinating to me. Of course, I care more than the average bear about this show, so it’s hard for me to tell what pecularities of this show are interesting to the average person. Here’s just a few of the odd patterns present when breaking this data down by value:
- Although the Beatles are notorious as a common subject on Jeopardy!, their prevalence as a response drops off rapidly after the $200 clue
- The same is true for Shakespeare, although the titles of his plays see reasonable representation across different clue values
- In general, some answers seem to favor certain values. "Barcelona" is three times as likely to be the answer to the $800 clue than any other value in Single Jeopardy
- Aeschylus is an even more expreme example. He has appeared as an answer only three times in Single Jeopardy, twice in a $1600 clue, and sixteen times under the $2000 clue.
It’s worth noting that none of these numbers are incredibly large. I don’t want to overstate the signifigance of Aeschylus’ 16 appearances, especially since there’s been about 36,000 questions worth $2000 in the past 25 years. Nonetheless, we’ve at least gained some insight into our question! It does seem that espensive questions tend to be more spread out than cheap questions.