Towards Fake News Detection based on Information Nutrition Labels.

Status


Abgeschlossene Masterarbeit

Bearbeiter


  • Alfred Sliwa

Formalia


Zielgruppe
  • AI Master
Voraussetzungen
  • Ability to read and understand papers written in English.
  • Ability to perform academic writing.
  • Strong programming skills (e.g.Java, App programming essential)
  • Lectures Information Retrieval oder Information Mining (essential)

Aufgabenstellung


In the recent years the availability of free accesible online documents rapidly increased. However, the large amount of information sources makes it more difficult to distinguish between real content and fake news articles. Previous research work on fake news detection has been done [MH14, HKN16, JCZL16, Goo17, CRC15, SSW+17]. But these systems perform only binary classification. Whether a news article is fake or not depends on the reader and on her interests. Fuhr [FGG+18] proposed to provide informa- tion nutrition labels for online documents in order to describe its content. It can be explained illustrative with an example of food packages. A food product is labeled with nutrition facts so that a consumer can associate this product with healthy or unhealthy lifestyle. Similar to food packages an online document can be associated with reliable or unreliable content based on the information nutrition labels.

In this thesis the idea of information nutrition labels for online documents is adopted and new infor- mation nutrition labels are researched. The investigated nutrition labels will be integrated in a mobile application capable in visualizing the various information nutrition facets in advance with respect to a news article. Such an application has the potential to increase the reader's awareness and supports her in the decision making. A new proposed nutrition label Pollutant, which is related to Virality and described in the original nutrition label set, will be investigated. Optionally, a further nutrition label Contradicting is inspected which is inspired by Controversy label.

  • [CRC15] Niall J Conroy, Victoria L Rubin, and Yimin Chen. Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology, 52(1):1-4, 2015.
  • [FGG+18] Norbert Fuhr, Anastasia Giachanou, Gregory Grefenstette, Iryna Gurevych, Andreas Hanselowski, Kalervo Jarvelin, Rosie Jones, YiquN Liu, Josiane Mothe, Wolfgang Nejdl, et al. An information nutritional label for online documents. In ACM SIGIR Forum, volume 51, pages 46-66. ACM, 2018.
  • [Goo17] Emma Goodman. How has media policy responded to fake news? Media Policy Blog, 2017. [HKN16] Momchil Hardalov, Ivan Koychev, and Preslav Nakov. In search of credible news. In Interna- tional Conference on Artificial Intelligence: Methodology, Systems, and Applications, pages 172-180. Springer, 2016.
  • [JCZL16] Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. News verification by exploiting conicting social viewpoints in microblogs. In AAAI, pages 2972-2978, 2016.
  • [MH14] David M Markowitz and Jeffrey T Hancock. Linguistic traces of a scientific fraud: The case of diederik stapel. PloS one, 9(8):e105937, 2014.
  • [SSW+17] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1):22{ 36, 2017.
  • Tasks:

    • literature scan. This should be done before the actual project starts. Here the student will be given some initial papers. Based on these papers the student should collect more papers, perform a review of all the papers and prepare an oral presentation of 30 mins. providing an intro to the field. This should take 2-3 weeks. Actual work:
    • Preprocessing of data. This can be done automatically using Natural Language Processing techniques.
    • Feature extraction and supervised learning. The student should perform automatic feature extraction and apply machine learning to extract the information nutrition labels. Perfomance of components should be evaluated using standard evaluation metrics.