Evaluation of Wolfram AlphaJune 1, 2009
This is long overdue, but here are some initial tests that I've performed on
Wolfram|Alpha. The tests are broken down into six rounds of tests -- I only wrote the round 2 tests after writing and performing the round 1 tests, etc.
Round 1  | 7/8: PASS: 0.875 |
  | 7 / 8: PASS: 0.875 |
  | what time is it?: FAIL |
  | the current time: PASS: 1:55:29 pm |
  | what is the current date?: FAIL |
  | the current date: FAIL |
  | what is the current temperature in Toronto?: PASS: 15 C |
  | what is the capital city of Canada?: PASS: Ottawa |
  | what is the capital city of Ontario?: FAIL |
  | what is the current temperature in the capital city of Canada?: PASS: Ottawa |
  | convert 0 degrees Celsius to Fahrenheit: PASS: 32 F |
  | 0 Celsius = ? Fahrenheit: PASS: 32 F |
  | 0 C = ? F: PASS: 32 F |
  | what is President Clinton's first name?: FAIL |
  | what is Bill Clinton's birthday?: PASS: 19-08-1946 |
  | how old is Bill Clinton?: PASS: 62 years |
SCORE: 11 / 16
Round 2  | what is 7 plus 8?: PASS: 15 |
  | what is the third prime number?: FAIL: Expected 5 |
  | what is the first day of the week?: FAIL: Expected Sunday or Monday |
  | when did World War II end?: FAIL: Expected 02-09-1945 |
  | how many miles are in a kilometer?: PASS: 0.6214 |
  | how many miles are in a marathon?: PASS: 26.22 |
  | how far is Waterloo Ontario from Ottawa?: FAIL |
SCORE: 3 / 7
Round 3  | how many calories are in an apple?: PASS: 91 calories |
  | how many grams of fat are in a BigMac?: FAIL |
  | how many calories are in 10 apples?: FAIL |
  | how many calories are in a cubic meter of cheese?: FAIL |
  | what is Bill Clinton's first name?: FAIL |
  | how far is it from the moon to Earth?: FAIL |
SCORE: 1 / 6
Round 4  | how many prime numbers are less than 100?: FAIL |
  | how old is Canada?: FAIL |
  | who was the first Prime Minister of Canada?: FAIL |
  | what is the square root of one hundred forty four?: PASS |
  | what is four fifths times five?: PASS |
  | what is the population of Canada / USA?: PASS |
SCORE: 3 / 6
Round 5  | x = y^2: PASS |
  | x = y^2 where y = 4: FAIL |
  | how many nautical miles is it from Toronto to Ottawa?: FAIL |
  | how long would it take sound to travel from Toronto to Ottawa?: FAIL |
  | what is the wavelength of red light?: FAIL |
  | is water denser than lead?: FAIL |
SCORE: 1 / 6
Round 6  | how many vowels are in the word "Test"?: FAIL |
  | how many letters are in the alphabet?: FAIL |
  | what is $300 + 10%?: PASS |
  | how many water atoms are in a liter of water?: FAIL |
SCORE: 1 / 4
FINAL SCORE: 20 / 45
Overall, I'm quite impressed. As far as I know, this has never been done before.
Still, deep down inside, I think everyone knows that systems can be created to do much better than this. The question is whether Wolfram|Alpha will evolve to become a system that scores 90% on a test like this, or not.
Anyway, for the time being Wolfram|Alpha takes the crown as being the most impressive search system that allows natural language (ish) queries and specific search results.
Well done!
Ford SYNCMay 30, 2009
Good job Ford.
Nice to see North American car makers pushing the envelope.
Gizmodo on Wolfram AlphaMay 5, 2009
ArticleThis is the first time I've stumbled across coverage of Wolfram Alpha on one of the main news sites that I read. Of the examples they gave, I'm pretty impressed by Wolfram Alpha's response.
One of the curious aspects to me is the
presentation layer, which figures out what information to present/compute for a given thing, and then how exactly to present it. For instance, the graphic that illustrates the great circle path between two cities. I wonder how they've made that work in such a broad way?
older >>