Generating Search Term Variants for Text Collections with Historic Spellings

  • Citation-Key:
    Ernst/Fuhr:06
  • Title:
    Generating Search Term Variants for Text Collections with Historic Spellings
  • Author(s):
    Andrea Ernst-Gerlach
    Norbert Fuhr
  • In:
    • Citation-Key:
      ECIR:06
    • Title:
      28th European Conference on Information Retrieval Research (ECIR 2006)
    • Editor(s):
      Mounia Lalmas
      Andy MacFarlane
      Stefan M. Rüger
      Anastasios Tombros
      Theodora Tsikrika
      Alexei Yavlinsky
    • Publisher:
      Springer
    • In:
      ECIR
    • Year:
      2006
  • Year:
    2006

Abstract:


In this paper, we describe a new approach for retrieval in texts with non-standard spelling, which is important for historic texts in English or German. For this purpose, we present a new algorithm for generating search term variants in ancient orthography. By applying a spell checker on a corpus of historic texts, we generate a list of candidate terms for which the contemporary spellings have to be assigned manually. Then our algorithm produces a set of probabilistic rules. These probabilities can be considered for ranking in the retrieval stage. An experimental comparison shows that our approach outperforms competing methods.

Fulltext as PDF