Language Models and Smoothing Methods for Collections with Large Variation in Document Length
-
- Zitationsschlüssel:
- Abdulmutalib/Fuhr:08
-
- Titel:
- Language Models and Smoothing Methods for Collections with Large Variation in Document Length
-
- Autor(en):
- Najeeb Abdulmutalib
- Norbert Fuhr
-
- In:
-
- Zitationsschlüssel:
- Tjoa/Wagner:08
-
- Titel:
- 19th International Workshop on Database and Expert Systems Applications (DEXA 2008), 1-5 September 2008, Turin, Italy
-
- Herausgeber:
- A M. Tjoa
- R. R. Wagner
-
- Verlag:
- IEEE Computer Society
-
- In:
- DEXA Workshops
-
- Jahr:
- 2008
-
- Seite(n):
- 9-14
Zusammenfassung:
In this paper we present a new language model based on an odds formula, which explicitly incorporates document length as a parameter. Furthermore, a new smoothing method called exponential smoothing is introduced, which can be combined with most language models. We present experimental results for various language models and smoothing methods on a collection with large document length variation, and show that our new methods compare favorably with the best approaches known so far.
Volltext als PDF