Clustering of Tweets according to Topic
- Farid Muradov
- AI Master
- Ability to read and understand papers written in English.
- Ability to perform academic writing.
- Strong programming skills (e.g.Java, essential)
- Lectures Information Retrieval oder Information Mining and the use of tools such as RapidMiner (essential)
In the last decade, social media have become the platforms par excellence for all kinds of online information exchange, such as: content creation, consumption and sharing; commenting on and engaging with contents posted by others; organisation of events; reporting and tracking of real world events; rating and reviewing products; catching up with the latest developments in the news; etc. Among the best known platforms today are Facebook, Twitter, Sina Weibo, Reddit and Instagram. Besides individuals, the presence of companies, agencies, institutions and politicians has also increased in social media. One of their objectives is to engage with a broader audience, while also learning from them. For instance, companies are interested in finding out what customers think about their products to further improve their services and perform targeted advertisements. Given the scale of social media use, it is also being leveraged to perform predictions on a variety of issues such as political elections, referenda and stock markets.
While social media mining and analysis has become the ultimate approach to dig for knowledge and understand opinions of others it is only available for those who are technically equipped. For instance researchers from social sciences or other studies where technical aspects are not the main focus have difficulties to perform social media mining and analysis and are hindered from using the massive knowledge available in social media.
Currently we are developing a web application that addresses this gap. The application allows users to login and create their own analyses pipelines. Each analysis process consists of building blocks that can be easily plugged in to a pipeline. A user is able to create several pipelines. The heard of each pipeline consist of analysis solutions.
Among these solutions is the cluster analysis. Performing cluster analysis on user posts has potential usefulness, for example finding trends in social media. In this thesis we consider cluster analysis for investigating social media. This approach helps us to group all different post together and give user broad view of discussed topics, subtopics for theme the user is interested in. For instance the user might be interested in understanding the opinions of social media users during a political election. Here cluster analysis will help the user to group different viewpoints, give a faster overview what the users think and discuss and also provide a feeling about the hot or major topics. With this understanding actions can be planned and performed accordingly.
For achieving the goal the candidate should investigate several solutions for social media posts clustering, evaluate the solutions using intrinsic evaluation metrics and prepare the best performing solution to be used within the web application. Furthermore, the candidate should investigate various ways of representation of clusters to the users.