Models in Information Retrieval
Retrieval models form the theoretical basis for computing the answer to a query. They differ not only in the syntax and expressiveness of the query language, but also in the representation of the documents. Following Rijsbergen's approach of regarding IR as uncertain inference, we can distinguish models according to the expressiveness of the underlying logic and the way uncertainty is handled. Classical retrieval models are based on propositional logic. Boolean retrieval ignores uncertainty, whereas fuzzy retrieval uses fuzzy logic for this purpose, and probabilistic retrieval is based on probability theory. In the vector space model, documents and queries are represented as vectors in a vector space spanned by the index terms, and uncertainty is modelled by considering geometric similarity. Probabilistic models make assumptions about the distribution of terms in relevant and nonrelevant documents in order to estimate the probability of relevance of a document for a query. Language models compute the probability that the query is generated from a document. For IR applications dealing not only with texts, but also with multimedia or factual data, propositional logic is not sufficient. Therefore, advanced IR models use restricted forms of predicate logic as basis. Terminological/description logics are rooted in semantic networks and terminological languages like e.g. KL-ONE. Datalog uses function-free horn clauses. Probabilistic versions of both approaches are able to cope with the intrinsic uncertainty of IR.
Fulltext as PDF