A Sentimental Education -- For Software

By Roland Piquepaille

Imagine you work for a company which introduces a new product. Obviously, you would want to know if the public likes it or not. But how would you find it? You could search the Web and read every possible document that mentions your product. This might be very time-consuming. Help is on the way, with a software that will scan the Web for you and separate the positive and negative reviews. This software might be based on research done at Cornell University and described by Technology Research News in "Software sorts out subjectivity." The researchers are improving 'sentiment classification' by removing neutral sentences. Their machine-learning method then applies only to subjective portions of the document. But the following negative statement, which contains only positive words, shows the difficulty to classify a sentence as positive or negative: "If you think this laptop is a great deal, I've got a nice bridge you might be interested in." It may take a decade before such a system is widely available. Read more...

Here is how Technology Research News introduces the problem of automatic sentiment classification.

One of the fundamental challenges in getting computers to sort and analyze text is finding ways to automatically classify information.
Applications like search engines that group similar documents do so using topic-based categories. Sentiment analysis techniques add another dimension by determining the author's attitude about a topic rather than just identifying a topic.
Existing techniques tend to concentrate on finding words, phrases and patterns that indicate sentiment. This has proven difficult, however. "This laptop is a great deal", for instance, shows strong sentiment, but contains the same words as the neutral sentence "The release of this new laptop drew a great deal of media attention."

So how do you teach a computer to 'understand' the meaning of words?

Researchers from Cornell University have devised a way to improve sentiment classification that sidesteps having to deal with meaning by instead concentrating on context. Their method weeds out neutral sentences. "Getting rid of neutral sentences like 'The release of this new laptop drew a great deal of media attention' [makes] the overall sentiment more obvious," said Lillian Lee, an associate professor of computer science at Cornell University.
Polarity classification via subjectivity detection This diagram shows how the software uses subjectivity detection to obtain a polarity classification via (Credit: Bo Pang and Lillian Lee, Cornell University).

Here are more details about the method.

The researchers represented text as a network, or graph. "Imagine that each sentence is represented by a network point, or node," said Lee. To model contextual information between each pair of sentence nodes, the researchers added a link whose strength represented how much the two sentences deserved the same label -- objective or subjective -- based on criteria including how close the sentences are to the text, and whether they are separated by a paragraph boundary.
The model also took into consideration the evidence within a sentence that the sentence is subjective or objective. Possible evidence that a sentence is subjective, for example, includes the presence of a word like 'wonderful', or 'terrible', said Lee.
Each sentence was linked strongly or weakly to a special subjective and objective nodes depending on the amount of evidence there was within the sentence that it was subjective or objective.
The sentences are then clustered into subjective and objective camps based on the strength of the links. This is a graph partitioning problem known as finding the minimum cut, and it can be solved exactly by a quick, efficient algorithm, said Lee.

And is this approach successful?

The method improved sentiment classification performance from 82.8 to 86.4 percent, which is statistically very significant, according to Lee. The method could eventually be used to maintain review-aggregator Web sites, to filter search results by viewpoint, and to track attitudes toward a given topic, she said.

When will be able to use such a software? And what will it be useful for?

It will take at least a decade before the system can readily handle unrestricted texts containing arbitrary rhetorical devices, she said.
The method could be used by search engines to sort or filter results by viewpoint to, for instance, help users distinguish between objective and biased Web sites, said Lee.
It could also be used to track changes in attitudes toward a given topic by, for instance, analyzing press articles, she said.
And companies could use the system to gather business intelligence such as finding out what people think of their products or the products of their competitors. "A computer company might crawl blogs to find out whether or not people like its latest laptop model," said Lee.

The research work has been published in the Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, held July 21 to 26, 2004 in Barcelona, Spain under the title "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts."

Here are two links to the abstract and to the full paper (PDF format, 8 pages, 264 KB). The above diagram was extracted from this paper.

Sources: Kimberly Patch, Technology Research News, November 17/24, 2004; Cornell University website

Related stories can be found in the following categories.


Famous quotes containing the words sentimental and/or education:

    What a man calls his “conscience” is merely the mental action that follows a sentimental reaction after too much wine or love.
    Helen Rowland (1875–1950)

    To me education is a leading out of what is already there in the pupil’s soul. To Miss Mackay it is a putting in of something that is not there, and that is not what I call education, I call it intrusion.
    Muriel Spark (b. 1918)