Thoughts about Social Media Analysis Class: Sentiment Analyzing on Weibo data

2014年11月10日星期一

Sentiment Analyzing on Weibo data

Sentiment analyzing is analyzing a sentence to find out that the sentiment of it is positive or negative. Part of the project task of our team is analyzing the sentiment of the comments of 4 domestic mobile phones: Xiaomi M4, Smartisan T1, Huawei Honor6, Meizu MX4.

Here are the key steps for algorithm：
1.    Read the text and tokenize it.
2.    In every sentence, find the sentiment word, and record its feature (positive or negative) according to the sentiment dictionary and position.
3.    Find the adverb of degree before the sentiment word. When we find one then stop searching. And we will set weights for adverbs of different degrees. And the weights will multiply the sentiment value (assume the primary sentiment value of every sentiment word is 1)
4.    Find all the negation word before the sentiment word. If the number of negation word is odd, then the sentiment value will multiply -1. If the number is even, multiply 1.
5.    If there is ‘!’ in the sentence, every ‘!’ will add 2 sentiment value to the corresponding feature.
6.    Print out positive and negative value and the corresponding percentage of every sentence.
7.    Add all the sentiment value up and print out the positive and negative value and the corresponding percentage of the whole text.
8.    Calculate the average and variance of the positive and negative sentiment for the text.

And during the programming, I came up with some problems:
1.       Python is a little troublesome for processing Chinese characters. The encode information should be presented in Unicode.
2.       Python sometimes can’t input the data in txt file completely.
3.       As the Internet words are much different from the standard sentiment dictionary. We should add and edit some words in the sentiment dictionary after we study the linguist habits of the netizen. That will promote the accuracy of the analyzing outcome.
4.       There is some difference of sentiment value between analyzing the whole text and analyzing every sentence and add them together. I think there might be some unnecessary values at the boundary of two sentences. For example, the sentiment word in the beginning of a sentence will look for adverb of degree in the end of the previous sentence.

We are still working on optimizing the outcome. Hope we can achieve our goals.

14 条评论:

Unknown2014年11月12日 22:00
Hello Jiang Yue! You shared the sentiment analyzing parts of your group project. I find out that your group has very structured and well organized algorithm. I wish you can achieve your goals!
回复删除
回复
Unknown2014年11月17日 23:47
I am interested in the forth step of your algorithm. It has such as large impact that it can determine whether a post is positive or not.
Is there any case that there are even number of negation words while the sentiment value is not reversed?
For example: 我真的不不不不高興！
回复删除
回复
RuoLi2014年11月18日 19:47
After trying to analysis Chinese Weibo, I find you have to be familiar with the language or you can not get accurate answer. Dictionary is a key issue. But we still have to build by hand. There is no end in NLP. This project is a great challenge. Thank you for your sharing.
回复删除
回复
Unknown2014年11月19日 01:41
Interesting topic of project. But how to deal with the Chinese words may become your big problem. There is no naturally split between the Chinese words. Besides, Chinese corpus is also very rare. Hope you can solve those problems and get a good point in your project.
回复删除
回复
Unknown2014年11月20日 19:49
Hi Yue,
The steps of sentiment analysis are very detailed in this blog with a specific application on mobile phone comments analysis. So it helped me to realize some intractable problems that may occur when conducting the sentiment analysis. I hope you can solve these problems and get the desirable results.
回复删除
回复
Unknown2014年11月20日 22:40
Hi, I also try to analyze the data based on weibo. But I want to know how to get the access token and how to get the authority to get access to weibo API, could you share more detail? Thank you very much.
回复删除
回复
Unknown2014年11月21日 06:33
hi chenxi, um, I am really appreciating that you list your opinions and thoughts in 1,2,3 ... which helps us to make up our mind. sentiment analysis is important, we can see every group of project uses it to organize our data and get our conclusions.
回复删除
回复
Unknown2014年11月21日 07:37
You wrote our project as the last blog, so clever honey~ ^-^ Today's presentation is perfect! I cannot imagine how can I do the project without you guys. You are all so brilliant~!
回复删除
回复

添加评论