Sentiment Analyzing on Weibo data
Sentiment analyzing is
analyzing a sentence to find out that the sentiment of it is positive or
negative. Part of the project task of our team is analyzing the sentiment of
the comments of 4 domestic mobile phones: Xiaomi M4, Smartisan T1, Huawei
Honor6, Meizu MX4.
Here are the key steps for algorithm:
1. Read the text and tokenize it.
2. In every sentence, find the sentiment word, and record
its feature (positive or negative) according to the sentiment dictionary and
position.
3. Find the adverb of degree before the sentiment word. When
we find one then stop searching. And we will set weights for adverbs of
different degrees. And the weights will multiply the sentiment value (assume
the primary sentiment value of every sentiment word is 1)
4. Find all the negation word before the sentiment word. If
the number of negation word is odd, then the sentiment value will multiply -1.
If the number is even, multiply 1.
5. If there is ‘!’ in the sentence, every ‘!’ will add 2
sentiment value to the corresponding feature.
6. Print out positive and negative value and the corresponding
percentage of every sentence.
7. Add all the sentiment value up and print out the positive
and negative value and the corresponding percentage of the whole text.
8. Calculate the average and variance of the positive and
negative sentiment for the text.
And during the programming, I
came up with some problems:
1.
Python is a
little troublesome for processing Chinese characters. The encode information
should be presented in Unicode.
2.
Python
sometimes can’t input the data in txt file completely.
3.
As the Internet
words are much different from the standard sentiment dictionary. We should add
and edit some words in the sentiment dictionary after we study the linguist habits
of the netizen. That will promote the accuracy of the analyzing outcome.
4.
There is
some difference of sentiment value between analyzing the whole text and
analyzing every sentence and add them together. I think there might be some
unnecessary values at the boundary of two sentences. For example, the sentiment
word in the beginning of a sentence will look for adverb of degree in the end
of the previous sentence.
We are still working on optimizing the outcome. Hope we can achieve our goals.
Hello Jiang Yue! You shared the sentiment analyzing parts of your group project. I find out that your group has very structured and well organized algorithm. I wish you can achieve your goals!
回复删除I am interested in the forth step of your algorithm. It has such as large impact that it can determine whether a post is positive or not.
回复删除Is there any case that there are even number of negation words while the sentiment value is not reversed?
For example: 我真的不不不不高興!
Thanks for your comment! The example is a problem indeed. Different people may have different habits to express their feelings. And we don't have accurate data to train classifiers, so I choose to use sentiment dictionary to extract data for a general trend. If you have a better method, you can discuss with me.
删除Hi Yue.
删除Provide that this rule has such a great impact on the result, I think it is worthwhile to examine some of the samples which this rule applied, manually.
It may not be a problem if the problem rarely exists.
After trying to analysis Chinese Weibo, I find you have to be familiar with the language or you can not get accurate answer. Dictionary is a key issue. But we still have to build by hand. There is no end in NLP. This project is a great challenge. Thank you for your sharing.
回复删除Yes, you are right. Maybe machining learning will lead to more accurate answer. But there will be more steps for data pre-processing. It might be harder than building rules in our situation.
删除Interesting topic of project. But how to deal with the Chinese words may become your big problem. There is no naturally split between the Chinese words. Besides, Chinese corpus is also very rare. Hope you can solve those problems and get a good point in your project.
回复删除I use 'jieba' —— python's own library to tokenize Chinese word. I tried some sentences and it showed good performance. And we also concluded the regulations of 'jieba' and made our dictionary more suitable for it.
删除Hi Yue,
回复删除The steps of sentiment analysis are very detailed in this blog with a specific application on mobile phone comments analysis. So it helped me to realize some intractable problems that may occur when conducting the sentiment analysis. I hope you can solve these problems and get the desirable results.
Hi, I also try to analyze the data based on weibo. But I want to know how to get the access token and how to get the authority to get access to weibo API, could you share more detail? Thank you very much.
回复删除hi chenxi, um, I am really appreciating that you list your opinions and thoughts in 1,2,3 ... which helps us to make up our mind. sentiment analysis is important, we can see every group of project uses it to organize our data and get our conclusions.
回复删除sorry.. jiangyue...not chenxi..just edit too quickly.....
删除Haha! orz Just can't help myself stop leaving a line.... Good job!
删除You wrote our project as the last blog, so clever honey~ ^-^ Today's presentation is perfect! I cannot imagine how can I do the project without you guys. You are all so brilliant~!
回复删除