2014年11月10日星期一

Sentiment Analyzing on Weibo data

Sentiment analyzing is analyzing a sentence to find out that the sentiment of it is positive or negative. Part of the project task of our team is analyzing the sentiment of the comments of 4 domestic mobile phones: Xiaomi M4, Smartisan T1, Huawei Honor6, Meizu MX4. 

Here are the key steps for algorithm
1.    Read the text and tokenize it.
2.    In every sentence, find the sentiment word, and record its feature (positive or negative) according to the sentiment dictionary and position.
3.    Find the adverb of degree before the sentiment word. When we find one then stop searching. And we will set weights for adverbs of different degrees. And the weights will multiply the sentiment value (assume the primary sentiment value of every sentiment word is 1)
4.    Find all the negation word before the sentiment word. If the number of negation word is odd, then the sentiment value will multiply -1. If the number is even, multiply 1.
5.    If there is ‘!’ in the sentence, every ‘!’ will add 2 sentiment value to the corresponding feature.
6.    Print out positive and negative value and the corresponding percentage of every sentence.
7.    Add all the sentiment value up and print out the positive and negative value and the corresponding percentage of the whole text.
8.    Calculate the average and variance of the positive and negative sentiment for the text.
 
And during the programming, I came up with some problems:
1.       Python is a little troublesome for processing Chinese characters. The encode information should be presented in Unicode.
2.       Python sometimes can’t input the data in txt file completely.
3.       As the Internet words are much different from the standard sentiment dictionary. We should add and edit some words in the sentiment dictionary after we study the linguist habits of the netizen. That will promote the accuracy of the analyzing outcome.
4.       There is some difference of sentiment value between analyzing the whole text and analyzing every sentence and add them together. I think there might be some unnecessary values at the boundary of two sentences. For example, the sentiment word in the beginning of a sentence will look for adverb of degree in the end of the previous sentence.
 
We are  still working on optimizing the outcome. Hope we can achieve our goals.