Eric Horvitz, distinguished scientist and co-director at Microsoft Research, and Kira Radinsky, a PhD researcher at the Technion-Israel Institute, say they have developed software which can predict future events.
The prototype uses a mix of archival material from the New York Times and data from several websites, including Wikipedia. During its setup phase, the system used 22 years of New York Times archives, from 1986 to 2007.
“One source we found useful was DBpedia, which is a structured form of the information inside Wikipedia constructed using crowdsourcing,” Radinsky told told MIT Technology Review. “We can understand, or see, the location of the places in the news articles, how much money people earn there, and even information about politics.” Other sources included WordNet, which helps software understand the meaning of words, and OpenCyc, a database of common knowledge.
The system could someday enable aid organizations to be more proactive in tackling disease outbreaks, Horvitz said.. “I truly view this as a foreshadowing of what’s to come,” he added. “Eventually, this kind of work will start to have an influence on how things go for people.”
The system provides some amazing results, apparently, when it is tested on historical data. Reports of droughts in Angola in 2006 triggered a warning about possible cholera outbreaks in the country, because previous events had taught the system that cholera outbreaks were more likely in years following droughts.
A second warning about cholera in Angola was triggered by news reports of large storms in Africa in early 2007—and, less than a week later, reports appeared that cholera had begun to spread. In similar tests involving forecasts of disease, violence, and high numbers of deaths, the system’s warnings were correct between 70 and 90 percent of the time.
According to Horvitz, the system is good enough to expect a more exact version that could be used in real settings, to assist experts at aid agencies involved in planning humanitarian response and readiness. “We’ve done some reaching out and plan to do some follow-up work with such people,” says Horvitz.
Horvitz and Radinsky are not the first to consider using online news and other data to forecast future events, but they say they make use of more data sources—more than 90 in total—which allows their system to be more general-purpose.
Microsoft doesn’t have plans to commercialize Horvitz and Radinsky’s research as yet, but the project will continue, says Horvitz, who wants to mine more newspaper archives as well as digitized books.
“Eventually this kind of work will start to have an influence on how things go for people,” Horvitz said.