Fact Verification using Large Textual Sources.

Status


Finished master thesis

Student


  • Jan Heinrich Kowollik

Formalia


Targeted audience
  • AI Master
Preconditions
  • Ability to read and understand papers written in English.
  • Ability to perform academic writing.
  • Strong programming skills (e.g.Java, App programming essential)
  • Lectures Information Retrieval oder Information Mining, Deep Learning (essential)

Task description


Claim Verification is a natural language processing (NLP) task in which a source is used to determine whether a claim is correct or not, i.e. supporting or rejecting pieces of evidence are extracted from the source text that help to verify the claim. Claim verification is a common task of everyday life e.g. when reading a text or when you are not sure if you remember something correctly. As such the idea of having a computer do the verification task has gathered interest in the research community. In previous research the claim verification task has been solved with high accuracy using very small fact sources containing only a few sentences or a single sentence [1]. However when large sources are used the results drop drastically [2]. Therefore we want to explore the possibility to create a new NLP architecture for claim verification that also performs well when a claim is tested against large textual sources.

More precisely the aim of this master thesis is to develop and evaluate natural language processing architectures that can verify a claim and provide the corresponding pieces of evidence. The verification has to be made against a large textual source of facts. The system has to decide if there are enough facts to verify the claim. If there are enough facts, the system has to provide the individual facts as evidence and decide whether the claim is supported or refuted by the pieces of evidence found in the text. The large textual fact source will be the same one as in [2] which was originally derived from Wikipedia. As training and evaluation data for the NLP models the FEVER dataset [2] will be used.

  • [1] Parikh, Ankur P. ; Tackström , Oscar ; Das, Dipanjan ; Uszkoreit, Jakob: A decomposable attention model for natural language inference. In: arXiv preprint arXiv:1606.01933 (2016)
  • [2] Thorne, James ; Vlachos, Andreas ; Christodoulopoulos, Christos ; Mittal, Arpit: FEVER: a Large-scale Dataset for Fact Extraction and VERification. In: NAACL-HLT, 2018
  • Tasks:

    • Literature scan. This should be done before the actual project starts. Here the student will be given some initial papers. Based on these papers the student should collect more papers, perform a review of all the papers and prepare an oral presentation of 30 mins. providing an intro to the field. This should take 2-3 weeks. Actual work:
    • Integrate existing works from natural language processing research (baseline methods) and create the required preprocessing and post-processing tools for inclusion in the architecture.
    • Create own solutions including new neural network architectures, create the required preprocessing and post-processing tools for inclusion in the architecture.
    • Evaluate the models against given testing data.