Is this Tweet Bad or Good News?
Finished master thesis
- Piush Aggarwal
- Targeted audience
- AI Master
- Ability to read and understand papers written in English.
- Ability to perform academic writing.
- Strong programming skills (e.g.Java, essential)
- Lectures Information Retrieval oder Information Mining and the use of tools such as RapidMiner (essential)
In the last decade, social media have become the platforms par excellence for all kinds of online information exchange, such as: content creation, consumption and sharing; commenting on and engaging with contents posted by others; organisation of events; reporting and tracking of real-world events; rating and reviewing products; catching up with the latest developments in the news; etc. Among the best-known platforms today are Facebook, Twitter, Sina Weibo, Reddit and Instagram. Besides individuals, the presence of companies, agencies, institutions and politicians has also increased in social media. One of their objectives is to engage with a broader audience, while also learning from them. For instance, companies are interested in finding out what customers think about their products to further improve their services and perform targeted advertisements. Given the scale of social media use, it is also being leveraged to perform predictions on a variety of issues such as political elections, referenda and stock markets.
While social media mining and analysis has become the ultimate approach to dig for knowledge and understand opinions of others it is only available for those who are technically equipped. For instance, researchers from social sciences or other less technical studies have difficulties to perform social media mining and analysis and are excluded from the knowledge available in social media.
Currently we are developing a web application that addresses this gap. The application allows users to login and create their own analyses pipelines. Each analysis process consists of building blocks that can be easily plugged in to a pipeline. A user is able to create several pipelines. The heard of each pipeline consist of analysis solutions.
In this master thesis the aim is to develop an analysis solution to to classify posts as news or non-news, as well as posts that are news as good or bad news.
This type of classification is important for the social science research and journalism sectors as they spend most of their resources in finding out informative news from social media. This means for them it is extremely imperative to have profoundly newsworthy content. Manually gathering such data does not scale up and thus automatic solutions provide an ideal situation.
Indeed, even this application can assist other social media users like politicians, economics, students, researchers, etc. who like to access to real news about events they follow.
Furthermore, according Personality and Social Psychology Bulletin by Angela Legg and Kate Sweeny published in March 2014 the vast majority (78%) wanted to hear the bad news first, trailed by the good news, because they believed they would feel better if they got the bad news out of the way and ended on a good note. Automatic classification of good and bad news certainly assists various news agencies and analysts from sociology to meet above objective.
The aim of this master thesis is to develop an analysis solution that aims to categorize each user post whether it is news or not, and when it is news then it should further categorize it to good and bad news. The candidate should first collect substantial enough labeled data. Once the data is available different machine learning approaches should be investigated to perform the two tasks. Each approach should be evaluated and metrics such as accuracy, precision, recall should be reported. Finally, best performing solutions should be integrated to the web application mentioned above.