Question: How does Wolfram Alpha work?
Stephen Wolfram: Well first thing is that we’ve collected a huge amount of data and we’ve curated this data, so it’s data where we’ve gone typically to sort of primary sources of data and made arrangements to get either the static data or a feed of data that comes in every minute, every second, every hour, whatever and we’ve organized this data so that it is sort of set up to be clean and computable, so that is sort of the first component is all this data that exists in the system, whether it’s about chemicals of countries or foods or nuclear isotopes or whatever else. Or financial data about companies or whatever, so all this data.
Then sort of the second piece is given this data how do we figure things out? How do we compute things? It’s like you know if you ask a suitable scientist you know can you figure out for me you know something like where will the sun be at a particular time of day at a particular place on the earth. You know if they can do their physics correctly they’ll eventually be able to figure out the answer. Or if you have this particular level of some substance in you know blood, what percentile of the distribution does this correspond to or what does that mean for the probability of this or that thing.
So these are things which sort of in principle can be computed if you find the right expert. What we’ve tried to do is to actually accumulate all of the algorithms necessary to do those computations. Typically that is done by talking to the world’s experts in these areas and encapsulating the knowledge that we get in the form of algorithms. Well, then, another aspect of this is: "So okay, so we know all the stuff. We can compute all these things. How does a typical person who walks up to Wolfram Alpha…? How do they communicate with it?" So that is then another big challenge is to be able to take sort of the natural language questions, the kind of the first way that people would think of asking such and such a question and being able to automatically understand that and that is something that wasn’t clear whether it was going to possible.
People have been trying to do kind of natural language processing with computers for decades and there has only been sort of slow progress in that in general. It turned out the problem we had to solve is sort of the reverse of the problem people usually have to solve. People usually have to solve the problem of you’re given you know thousands, millions of pages of text, go have the computer understand this. Our problem was: we can compute a certain set of things and then we’re given this very short sort of utterance that the human has fed us and we have to ask the question can we understand that utterance and map it into the things that we can compute, so it’s turned out somewhat to my surprise actually that we’ve been able using a bunch of ideas about sort of computational linguistics and so on to make really great progress in being able to let one sort of type in things in whatever form one first thinks of them and having the computer understand them.
And then finally one has to figure out so there are all these things we can compute. We’ve understood a question that has been asked. There is all sorts of things that we can give as the answer. Which parts of the things that we might give as the answer should we actually present? How do we sort of present the right graphical tabular and so on results to actually communicate it in an effective way with humans who are using the system?
So you know all the pieces we’ve managed to sort of pull them together and it all sort of has to be connected to sort of a big super-computer-type system, sort of big crunchy software engineering necessary to actually deliver results quickly to people on the Web and so on, but sort of the main objective here is sort of collect all this knowledge from our civilization, make it as much as possible computable, get it to the point where, sort of, if there is question that could be answered by some expert our system will be able to automatically answer that question. So one is asking sort of a one might ask a very specific question you know, “What will be the value of this particular… I don’t know… annuity after this amount with these particular dollar values based on this thing about interest rates or whatever else it is?” So that is something very specific. It’s not sort of, give me a general essay about such and such a topic. It’s I’ve got a specific question. Give me a specific answer.
Recorded July 26, 2010
Interviewed by Max Miller