Language Models and Smoothing Methods for Collections with Large Variation in Document Length

  • Zitationsschlüssel:
    Abdulmutalib/Fuhr:08
  • Titel:
    Language Models and Smoothing Methods for Collections with Large Variation in Document Length
  • Autor(en):
    Najeeb Abdulmutalib
    Norbert Fuhr
  • In:
    • Zitationsschlüssel:
      Tjoa/Wagner:08
    • Titel:
      19th International Workshop on Database and Expert Systems Applications (DEXA 2008), 1-5 September 2008, Turin, Italy
    • Herausgeber:
      A M. Tjoa
      R. R. Wagner
    • Verlag:
      IEEE Computer Society
    • In:
      DEXA Workshops
    • Jahr:
      2008
  • Seite(n):
    9-14

Zusammenfassung:


In this paper we present a new language model based on an odds formula, which explicitly incorporates document length as a parameter. Furthermore, a new smoothing method called exponential smoothing is introduced, which can be combined with most language models. We present experimental results for various language models and smoothing methods on a collection with large document length variation, and show that our new methods compare favorably with the best approaches known so far.

Volltext als PDF