Photo courtesy of: Russ College of Engineering and Technology
Mar 10, 2011
By Pete Shooner and Colleen Carow
Razvan Bunescu, an assistant professor of computer science at Ohio University's Fritz J. and Dolores H. Russ College of Engineering and Technology, has been awarded a $224,000 grant from the National Science Foundation (NSF) to create tools that will automatically extract world knowledge from Wikipedia.
Collaborating with graduate students from Ohio University, and faculty and graduate students from the University of North Texas, Bunescu plans to build computer programs that can process text documents and extract information about the concepts and entities mentioned in text. The result will be a repository of world knowledge that can then be used to improve artificial intelligence applications that process natural language.
The team's first goal is to tackle one of the largest collections of world knowledge available: Wikipedia. Bunescu wants to create an ontology – in the form of a multilingual "graph" of world knowledge – that maps every element of the online encyclopedia's information, and organizes the data to illustrate relationships between those elements.
As an example, Bunescu noted that the Russ College is part of Ohio University, which is a public university, which in turn is a special type of academic institution. "These part-of, instance-of, and is-a types of ontological relationships occur very often between real world entities and concepts," he explained.
But the computer programs the team is developing will be able to create a network of such relationships by automatically sifting through the semi-structured information available in the top 10 language editions of Wikipedia. In the resulting semantic network, nodes will correspond to the hundreds of thousands of entities and concepts described in Wikipedia, while edges between nodes will capture their semantic relationships.
One application Bunescu plans to explore would match new user questions to previously answered questions posted on popular and rapidly growing community-driven question answering sites such as Yahoo! Answers.
More generally, the graph may be used to provide the much-needed world knowledge in knowledge-intensive applications such as open domain question answering (QA). In open domain QA, users submit natural language questions and the QA system returns pinpointed answers, as opposed to a ranked list of documents.
"Google is able to do an intelligent keyword search, but it cannot yet do this kind of question answering," Bunescu explains.
Bunescu also foresees practical applications in the classroom, where teachers could provide a document with concepts or phrases that are automatically hyperlinked to corresponding pages from Wikipedia or other encyclopedias.
Although educators debate Wikipedia as a reputable academic resource because anyone can freely contribute content, the application would enable students to easily clarify difficult concepts and begin their quest to answer questions arising from their studies.