By Roland Piquepaille
Today, if you want to know what's going on in the world, you can watch TV, read your newspaper or use Internet to browse news sites. But imagine a day when you just have to enter a few words on your computer, such as "Olympic Games," push a button, and be able to read an automatic -- and accurate -- summary of what appears in major sources about this specific subject. This is the goal of a project which started at the University of Michigan and is explained by Technology Research News in "Summarizer ranks sentences." This new multi-document summarization technique, named LexRank, searches similarities among sentences and rates them via a concept of 'prestige score' analogous to the one used by Google's PageRank. "In a sense, sentences vote for each other just by virtue of being similar to each other," said one of the researchers. This algorithm may also be applied to automatic translation and question answering in a year or two. Read more...Let's start with a description of the project.
Researchers from the University of Michigan have developed a multi-document summarization technique that compares sentences and has the effect of sentences voting for the most important among them. The method, dubbed LexRank, combines the content-sorting concepts of prestige and lexical similarity to find the most important sentences in a group of documents on the same subject.
Algorithms that use prestige to sort information have been around since the '90s. It is possible to find the most prestigious, or popular member of a network by analyzing the relationships among network members. In a social network, for example, the most prestigious individual can be identified by analyzing the social relations among all pairs of members of the group.
Now, let's look in more details at how the LexRank algorithm uses similarities among sentences.
The researchers' lexical centrality algorithm compares the lexical similarity of sentences. "Lexical similarity can be thought of as a measure of the word overlap between two sentences," said Gunes Erkan [, one of the researchers.] "For example, 'Bush went to China' and 'George Bush visited China' are fairly similar in a lexical way [but] 'Bush visited China' and 'Blair is the prime minister of the United Kingdom' have no overlap at all," he said.
The researchers' system considers a sentence important if it is similar to many other sentences and if those other sentences are themselves important. "In a sense, sentences vote for each other just by virtue of being similar to each other," said Dragomir Radev [, an assistant professor at the University of Michigan.] "The sentences with the highest scores... are considered to contain the gist of the document and are presented as the multi-document summary," he said.
This algorithm is already used for a Web-based news summarization site, NewsInEssence. Please note that this is an experimentation and that the site is not always on. If you cannot access it from the previous link, try this one.
LexRank could have some other usages.
The researchers are also looking for other uses of the lexical centrality algorithm. Possibilities include automatic translation and question answering, said Radev. The method could potentially find sentences that are likeliest to contain the answer to a given natural language question, or, in the biomedical domain, sentences that are most likely to contain important facts like particular protein interactions, said Radev.
The research work was presented in July 2004 during the Empirical Methods in Natural Language Processing (EMNLP 2004) conference held in Barcelona, Spain. Please check the EMNLP 2004 Proceedings if you're inetrested in the subject.
And for more information, here are links to two technical documents about LexRank, "LexPageRank: Prestige in Multi-Document Text Summarization" (PDF format, 7 pages, 84 KB) and "LexRank: Graph-based Lexical Centrality as Salience in Text Summarization" (PDF format, 23 pages, 272 KB).
Will LexRank become one day as popular as PageRank is today? We'll know it in a year or two.
Sources: Kimberly Patch, Technology Research News, April 20/27, 2005; and various websites
Related stories can be found in the following categories.