The Subjective Web: Online Opinion Mining

At the end of July, Microsoft Research held its 2008 Faculty Summit to survey the state of computing R & D, which this year included a social media summit. A major topic of conversation included the transition of the internet from a network of documents to a network of people.

As participant (host) and Microsoft Scientist Matthew Hurst explains on his blog, “The PageRank era is marked by a very simple link with no explicit meaning and a simple assumption (a positive endorsement).” But this assumption of positive endorsement is becoming unnecessary as more and more direct evidence of people’s opinions and categorizations of content are available online. Research repeatedly reveals that others take notice of human-generated tags and reviews: “consumers report being willing to pay from 20% to 99% more for a 5-star-rated item than a 4-star-rated item (with variance depending on type of item/service)”, is just one example.

Many are excited by how much less processing-intensive the online content tagging process becomes with this trend – clusters of pages and facts seem to grow organically as a result of human tagging. This helps overcome previous problems related to content indexing within info retrieval, such as the gap between the language that the businesses or organizations use to label their content and the terminology preferred by their customers/users.

But there are challenges that arise as well in this transition that are less discussed. Says one scientist, aptly describing the phenomena, “fragmenting media and changing consumer behavior have crippled traditional [media] monitoring methods. Technorati estimates that 75,000 new blogs are created daily, along with 1.2 million new posts each day, many discussing consumer opinions on products and services. Tactics [of the traditional sort] such as clipping services, field agents, and ad hoc research simply can’t keep pace.” Call it what you will: Brand Monitoring, Online Image Tracking, Buzz Monitoring, Online Anthropology, Conversation Mining, Online Consumer Intelligence, Market Influence Analytics … The challenges remain the same. As an example, I think of a project I did here at Pure Visibility last year, which involved analyzing online review content related to a client’s company. After gathering the reviews (in the hundreds), I was faced with the daunting task of mining them for basic information like the overall majority sentiment expressed, and how this correlated with the source. My ultimate method was mostly manual and more than a little tedious.

Hurst’s blog contains a reference to a new book by Pang and Lee that surveys the state of Opinion Mining and Sentiment Analysis, (basically, data-mining and classification using human generated content). In addition to interesting facts on the power of opinions like those above, this book clearly outlines the process that such analysis requires, and the associated challenges. For example, incorporating user opinions into a search engine typically requires the following steps:

determining whether the user is looking for subjective information
accurately classifying docs into the opinionated and non-opinionated bins
identifying overall sentiments expressed and or/specific opinion regarding particular aspects
summarizing information, including aggregating votes via different rating scales, highlighting some opinions, representing disagreement/consensus points, id’ing opinion holders, etc

The challenges are numerous. To summarize some of the excellent points made by Pang and Lee, I sketched out the following table, which compares opinion mining to traditional text mining:

Opinion Mining	Fact-based Text Analysis
relatively few classes generalizing over many domains/users	often numerous classes (ie topic classification)
represent opposing (binary classification) or ordinal/numerical categories	classes can be unrelated
order can overcome frequency (in importance)	frequency typically correlates with classification
sentiment typically expressed in subtle manner not isolated to single sentence	though dependent on doc length, summarization using single sentence extraction often reasonable
non-trivial task of defining human-preferred keywords	accurate classification possible via data-driven only methods

To clarify on this last point, the authors note that this fact alone does not make the task more difficult than traditional topic classification, since data-driven approaches can be applied to the latter to improve accuracy over classification using a human-picked keyword list. The problem is that the accuracy of a data-driven method for opinion analysis is only about 80%, which is still not comparable to the performance expected in traditional topic-based classification.

While these challenges may seem intimidating enough to remain on the horizon for years to come, the fact that this book was written by a Yahoo research scientist, and one of the country’s top CS schools suggests that the right people are thinking about these trends. Significant changes in how we use the web may not be far off.

The post The Subjective Web: Online Opinion Mining appeared first on Pure Visibility.

The Subjective Web: Online Opinion Mining

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112