Measuring Praise and Criticism: Inference of Semantic Orientation from Association

of 24

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
PDF
24 pages
0 downs
3 views
Share
Description
Measuring Praise and Criticism: Inference of Semantic Orientation from Association PETER D. TURNEY National Research Council Canada and MICHAEL L. LITTMAN Rutgers University The evaluative character of
Transcript
Measuring Praise and Criticism: Inference of Semantic Orientation from Association PETER D. TURNEY National Research Council Canada and MICHAEL L. LITTMAN Rutgers University The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., honest, intrepid ) and negative semantic orientation indicates criticism (e.g., disturbing, superfluous ). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). An automated system for measuring semantic orientation would have application in text classification, text filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems (chatbots). This paper introduces a method for inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words. Two instances of this approach are evaluated, based on two different statistical measures of word association: pointwise mutual information (PMI) and latent semantic analysis (LSA). The method is experimentally tested with 3,596 words (including adjectives, adverbs, nouns, and verbs) that have been manually labeled positive (1,614 words) and negative (1,982 words). The method attains an accuracy of 82.8% on the full test set, but the accuracy rises above 95% when the algorithm is allowed to abstain from classifying mild words. Categories and Subject Descriptors: H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing linguistic processing; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval information filtering, search process; I.2.7 [Artificial Intelligence]: Natural Language Processing text analysis General Terms: Algorithms, Experimentation Additional Key Words and Phrases: semantic orientation, semantic association, web mining, text mining, text classification, unsupervised learning, mutual information, latent semantic analysis Authors addresses: P.D. Turney, Institute for Information Technology, National Research Council Canada, M- 50 Montreal Road, Ottawa, Ontario, Canada, K1A 0R6, M.L. Littman, Department of Computer Science, Rutgers University, Piscataway, NJ , USA: 1. INTRODUCTION In an early study of subjective meaning, Osgood et al. [1957] asked people to rate words on a wide variety of scales. Each scale was defined by a bipolar pair of adjectives, such as sweet/sour, rugged/delicate, and sacred/profane. The scales were divided into seven intervals. Osgood et al. gathered data on the ratings of many words by a large number of subjects and then analyzed the data using factor analysis. They discovered that three main factors accounted for most of the variation in the data. The intuitive meaning of each factor can be understood by looking for the bipolar adjective pairs that are most highly correlated with each factor. The primary factor, which accounted for much of the variation in the data, was highly correlated with good/bad, beautiful/ugly, kind/cruel, and honest/dishonest. Osgood et al. called this the evaluative factor. The second factor, called the potency factor, was highly correlated with strong/weak, large/small, and heavy/light. The third factor, activity, was correlated with active/passive, fast/slow, and hot/cold. In this paper, we focus on the evaluative factor. Hatzivassiloglou and McKeown [1997] call this factor the semantic orientation of a word. It is also known as valence in the linguistics literature. A positive semantic orientation denotes a positive evaluation (i.e., praise) and a negative semantic orientation denotes a negative evaluation (i.e., criticism). Semantic orientation has both direction (positive or negative) and intensity (mild or strong); contrast okay/fabulous (mild/strong positive) and irksome/horrid (mild/strong negative). We introduce a method for automatically inferring the direction and intensity of the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words. It is worth noting that there is a high level of agreement among human annotators on the assignment of semantic orientation to words. For their experiments, Hatzivassiloglou and McKeown [1997] created a testing set of 1,336 adjectives (657 positive and 679 negative terms). They labeled the terms themselves and then they validated their labels by asking four people to independently label a random sample of 500 of the 1,336 adjectives. On average, the four people agreed that it was appropriate to assign a positive or negative label to 89% of the 500 adjectives. In the cases where they agreed that it was appropriate to assign a label, they assigned the same label as Hatzivassiloglou and McKeown to 97% of the terms. The average agreement among the four people was also 97%. In our own study, in Section 5.8, the average agreement among the subjects was 98% and the average agreement between the subjects and our benchmark labels was 94% 2 (25 subjects, 28 words). This level of agreement compares favourably with validation studies in similar tasks, such as word sense disambiguation. This paper presents a general strategy for inferring semantic orientation from semantic association. To provide the motivation for the work described here, Section 2 lists some potential applications of algorithms for determining semantic orientation, such as new kinds of search services [Hearst 1992], filtering flames (abusive messages) for newsgroups [Spertus, 1997], and tracking opinions in on-line discussions [Tong, 01]. Section 3 gives two examples of our method for inferring semantic orientation from association, using two different measures of word association, Pointwise Mutual Information (PMI) [Church and Hanks 1989] and Latent Semantic Analysis (LSA) [Landauer and Dumais 1997]. PMI and LSA are based on co-occurrence, the idea that a word is characterized by the company it keeps [Firth 1957]. The hypothesis behind our approach is that the semantic orientation of a word tends to correspond to the semantic orientation of its neighbours. Related work is examined in Section 4. Hatzivassiloglou and McKeown [1997] have developed a supervised learning algorithm that infers semantic orientation from linguistic constraints on the use of adjectives in conjunctions. The performance of their algorithm was measured by the accuracy with which it classifies words. Another approach is to evaluate an algorithm for learning semantic orientation in the context of a specific application. Turney [02] does this in the context of text classification, where the task is to classify a review as positive ( thumbs up ) or negative ( thumbs down ). Pang et al. [02] have also addressed the task of review classification, but they used standard machine learning text classification techniques. Experimental results are presented in Section 5. The algorithms are evaluated using 3,596 words (1,614 positive and 1,982 negative) taken from the General Inquirer lexicon [Stone et al. 1966]. These words include adjectives, adverbs, nouns, and verbs. An accuracy of 82.8% is attained on the full test set, but the accuracy can rise above 95% when the algorithm is allowed to abstain from classifying mild words. The interpretation of the experimental results is given in Section 6. We discuss limitations and future work in Section 7 and conclude in Section APPLICATIONS The motivation of Hatzivassiloglou and McKeown [1997] was to use semantic orientation as a component in a larger system, to automatically identify antonyms and distinguish near synonyms. Both synonyms and antonyms typically have strong semantic 3 associations, but synonyms generally have the same semantic orientation, whereas antonyms have opposite orientations. Semantic orientation may also be used to classify reviews (e.g., movie reviews or automobile reviews) as positive or negative [Turney 02]. It is possible to classify a review based on the average semantic orientation of phrases in the review that contain adjectives and adverbs. We expect that there will be value in combining semantic orientation [Turney 02] with more traditional text classification methods for review classification [Pang et al. 02]. To illustrate review classification, Table 1 shows the average semantic orientation of sentences selected from reviews of banks, from the Epinions site. 1 In this table, we used SO-PMI (see Section 3.1) to calculate the semantic orientation of each individual word and then averaged the semantic orientation of the words in each sentence. Five of these six randomly selected sentences are classified correctly. Table 1. The average semantic orientation of some sample sentences. Positive Reviews Average SO 1. I love the local branch, however communication may break down if they have to go through head office. 2. Bank of America gets my business because of its extensive branch and ATM network. 3. This bank has exceeded my expectations for the last ten years Negative Reviews Average SO 1. Do not bank here, their website is even worse than their actual locations. 2. Use Bank of America only if you like the feeling of a stranger s warm, sweaty hands in your pockets. 3. If you want poor customer service and to lose money to ridiculous charges, Bank of America is for you In Table 1, for each sentence, the word with the strongest semantic orientation has been marked in bold. These bold words dominate the average and largely determine the orientation of the sentence as a whole. In the sentence that is misclassified as positive, the system is misled by the sarcastic tone. The negative orientations of stranger s and sweaty were not enough to counter the strong positive orientation of warm. 1 See 4 One application of review classification is to provide summary statistics for search engines. Given the query Paris travel review, a search engine could report, There are 5,000 hits, of which 80% are positive and % are negative. The search results could also be sorted by average semantic orientation, so that the user could easily sample the most extreme reviews. Alternatively, the user could include the desired semantic orientation in the query, Paris travel review orientation: positive [Hearst 1992]. Preliminary experiments indicate that semantic orientation is also useful for summarization of reviews. A positive review could be summarized by picking out the sentence with the highest positive semantic orientation and a negative review could be summarized by extracting the sentence with the lowest negative semantic orientation. Another potential application is filtering flames for newsgroups [Spertus 1997]. There could be a threshold, such that a newsgroup message is held for verification by the human moderator when the semantic orientation of any word in the message drops below the threshold. Tong [01] presents a system for generating sentiment timelines. This system tracks online discussions about movies and displays a plot of the number of positive sentiment and negative sentiment messages over time. Messages are classified by looking for specific phrases that indicate the sentiment of the author towards the movie, using a hand-built lexicon of phrases with associated sentiment labels. There are many potential uses for sentiment timelines: Advertisers could track advertising campaigns, politicians could track public opinion, reporters could track public response to current events, and stock traders could track financial opinions. However, with Tong s approach, it would be necessary to provide a new lexicon for each new domain. Tong s [01] system could benefit from the use of an automated method for determining semantic orientation, instead of (or in addition to) a hand-built lexicon. Semantic orientation could also be used in an automated chat system (a chatbot), to help decide whether a positive or negative response is most appropriate. Similarly, characters in software games would appear more realistic if they responded to the semantic orientation of words that are typed or spoken by the game player. Another application is the analysis of survey responses to open ended questions. Commercial tools for this task include TextSmart 2 (by SPSS) and Verbatim Blaster 3 (by StatPac). These tools can be used to plot word frequencies or cluster responses into categories, but they do not currently analyze semantic orientation. 2 See 3 See 5 3. SEMANTIC ORIENTATION FROM ASSOCIATION The general strategy in this paper is to infer semantic orientation from semantic association. The semantic orientation of a given word is calculated from the strength of its association with a set of positive words, minus the strength of its association with a set of negative words: Pwords = a set of words with positive semantic orientation Nwords = a set of words with negative semantic orientation A(word 1, word 2 ) = a measure of association between word 1 and word 2 SO-A(word) = (, pword ) pword Pwords A word A( word, nword). nword Nwords We assume that A(word 1, word 2 ) maps to a real number. When A(word 1, word 2 ) is positive, the words tend to be associated with each other. Larger values correspond to stronger associations. When A(word 1, word 2 ) is negative, the presence of one word makes it likely that the other is absent. A word, word, is classified as having a positive semantic orientation when SO-A(word) is positive and a negative orientation when SO-A(word) is negative. The magnitude (absolute value) of SO-A(word) can be considered the strength of the semantic orientation. In the following experiments, seven positive words and seven negative words are used as paradigms of positive and negative semantic orientation: Pwords = {good, nice, excellent, positive, fortunate, correct, and superior} (5) Nwords = {bad, nasty, poor, negative, unfortunate, wrong, and inferior}. (6) These fourteen words were chosen for their lack of sensitivity to context. For example, a word such as excellent is positive in almost all contexts. The sets also consist of opposing pairs (good/bad, nice/nasty, excellent/poor, etc.). We experiment with randomly selected words in Section 5.8. It could be argued that this is a supervised learning algorithm with fourteen labeled training examples and millions or billions of unlabeled training examples, but it seems more appropriate to say that the paradigm words are defining semantic orientation, rather than training the algorithm. Therefore we prefer to describe our approach as unsupervised learning. However, this point does not affect our conclusions. This general strategy is called SO-A (Semantic Orientation from Association). Selecting particular measures of word association results in particular instances of the (1) (2) (3) (4) 6 strategy. This paper examines SO-PMI (Semantic Orientation from Pointwise Mutual Information) and SO-LSA (Semantic Orientation from Latent Semantic Analysis) Semantic Orientation from PMI PMI-IR [Turney 01] uses Pointwise Mutual Information (PMI) to calculate the strength of the semantic association between words [Church and Hanks 1989]. Word cooccurrence statistics are obtained using Information Retrieval (IR). PMI-IR has been empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL), obtaining a score of 74% [Turney 01], comparable to that produced by direct thesaurus search [Littman 01]. The Pointwise Mutual Information (PMI) between two words, word 1 and word 2, is defined as follows [Church and Hanks 1989]: PMI(word 1, word 2 ) = p( word 1 & word log 2 p( word1) p( word ). ) 2 2 (7) Here, p(word 1 & word 2 ) is the probability that word 1 and word 2 co-occur. If the words are statistically independent, the probability that they co-occur is given by the product p(word 1 ) p(word 2 ). The ratio between p(word 1 & word 2 ) and p(word 1 ) p(word 2 ) is a measure of the degree of statistical dependence between the words. The log of the ratio corresponds to a form of correlation, which is positive when the words tend to co-occur and negative when the presence of one word makes it likely that the other word is absent. PMI-IR estimates PMI by issuing queries to a search engine (hence the IR in PMI-IR) and noting the number of hits (matching documents). The following experiments use the AltaVista Advanced Search engine 4, which indexes approximately 350 million web pages (counting only those pages that are in English). Given a (conservative) estimate of 300 words per web page, this represents a corpus of at least one hundred billion words. AltaVista was chosen over other search engines because it has a NEAR operator. The AltaVista NEAR operator constrains the search to documents that contain the words within ten words of one another, in either order. Previous work has shown that NEAR performs better than AND when measuring the strength of semantic association between words [Turney 01]. We experimentally compare NEAR and AND in Section 5.4. SO-PMI is an instance of SO-A. From equation (4), we have: SO-PMI(word)= (, pword) pword Pwords PMI word PMI( word, nword ). nword Nwords (8) 4 See 7 Let hits(query) be the number of hits returned by the search engine, given the query, query. We calculate PMI(word 1, word 2 ) from equation (7) as follows: PMI(word 1, word 2 ) = log 2 1 N 1 N hits( word 1 hits( word 1) NEAR word 1 N hits( word ). ) 2 2 (9) Here, N is the total number of documents indexed by the search engine. Combining equations (8) and (9), we have: SO-PMI(word) hits( word NEAR pword) hits( nword ) pword Pwords nword Nwords = log 2. hits( pword ) hits( word NEAR nword ) pword Pwords nword Nwords (10) Note that N, the total number of documents, drops out of the final equation. Equation (10) is a log-odds ratio [Agresti 1996]. Calculating the semantic orientation of a word via equation (10) requires twenty-eight queries to AltaVista (assuming there are fourteen paradigm words). Since the two products in (10) that do not contain word are constant for all words, they only need to be calculated once. Ignoring these two constant products, the experiments required only fourteen queries per word. To avoid division by zero, 0.01 was added to the number of hits. This is a form of Laplace smoothing. We examine the effect of varying this parameter in Section 5.3. Pointwise Mutual Information is only one of many possible measures of word association. Several others are surveyed in Manning and Schütze [1999]. Dunning [1993] suggests the use of likelihood ratios as an improvement over PMI. To calculate likelihood ratios for the association of two words, X and Y, we need to know four numbers: k(x Y) = the frequency that X occurs within a given neighbourhood of Y (11) k(~x Y) = the frequency that Y occurs in a neighbourhood without X (12) k(x ~Y) = the frequency that X occurs in a neighbourhood without Y (13) k(~x ~Y) = the frequency that neither X nor Y occur in a neighbourhood. (14) If the neighbourhood size is ten words, then we can use hits(x NEAR Y) to estimate k(x Y) and hits(x) hits(x NEAR Y) to estimate k(x ~Y), but note that these are only rough estimates, since hits(x NEAR Y) is the number of documents that contain X near Y, not the number of neighbourhoods that contain X and Y. Some preliminary experiments suggest that this distinction is important, since alternatives to PMI (such as likelihood 8 ratios [Dunning 1993] and the Z-score [Smadja 1993]) appear to perform worse than PMI when used with search engine hit counts. However, if we do not restrict our attention to measures of word association that are compatible with search engine hit counts, there are many possibilities. In the next subsection, we look at one of them, Latent Semantic Analysis Semantic Orientation from LSA SO-LSA applies Latent Semantic Analysis (LSA) to calculate the strength of the semantic association between words [Landauer and Dumais 1997]. LSA uses the Singular Value Decomposition (SVD) to analyze
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks