Towards Fake News Detection based on Information Nutrition Labels.
Finished master thesis
- Alfred Sliwa
- Targeted audience
- AI Master
- Ability to read and understand papers written in English.
- Ability to perform academic writing.
- Strong programming skills (e.g.Java, App programming essential)
- Lectures Information Retrieval oder Information Mining (essential)
In the recent years the availability of free accesible online documents rapidly increased. However, the large amount of information sources makes it more difficult to distinguish between real content and fake news articles. Previous research work on fake news detection has been done [MH14, HKN16, JCZL16, Goo17, CRC15, SSW+17]. But these systems perform only binary classification. Whether a news article is fake or not depends on the reader and on her interests. Fuhr [FGG+18] proposed to provide informa- tion nutrition labels for online documents in order to describe its content. It can be explained illustrative with an example of food packages. A food product is labeled with nutrition facts so that a consumer can associate this product with healthy or unhealthy lifestyle. Similar to food packages an online document can be associated with reliable or unreliable content based on the information nutrition labels.
In this thesis the idea of information nutrition labels for online documents is adopted and new infor- mation nutrition labels are researched. The investigated nutrition labels will be integrated in a mobile application capable in visualizing the various information nutrition facets in advance with respect to a news article. Such an application has the potential to increase the reader's awareness and supports her in the decision making. A new proposed nutrition label Pollutant, which is related to Virality and described in the original nutrition label set, will be investigated. Optionally, a further nutrition label Contradicting is inspected which is inspired by Controversy label.
- literature scan. This should be done before the actual project starts. Here the student will be given some initial papers. Based on these papers the student should collect more papers, perform a review of all the papers and prepare an oral presentation of 30 mins. providing an intro to the field. This should take 2-3 weeks. Actual work:
- Preprocessing of data. This can be done automatically using Natural Language Processing techniques.
- Feature extraction and supervised learning. The student should perform automatic feature extraction and apply machine learning to extract the information nutrition labels. Perfomance of components should be evaluated using standard evaluation metrics.