History and research topics
The IR group started 1991, when Norbert Fuhr was appointed Professor at the Computer Science Department of the Universiy of Dortmund. In 2002, the group moved to the University of Duisburg-Essen.
The charter of the specialist IR group of the German informatics society (GI) defines IR as a discipline which deals with uncertainty and vagueness in all kinds of information systems. Following this broad concept, our group is mainly interested in extending IR models and methods for dealing with problems beyond the classical text retrieval task. In particular, the combination of concepts from IR and database systems is an ongoing theme of our work, with applications such as relational databases, multimedia information systems, distributed digital libraries and XML documents.
As theoretic background for the new types of applications, our group combined Norbert Fuhr's earlier work on probabilistic IR models with logic-based approaches. A major result of this work was the development of probabilistic Datalog during the ESPRIT project FERMI (1994-97), which focused on retrieval methods for multimedia documents. Based on this model, the retrieval engine HySprit was implemented, which offers flexible and efficient retrieval mechanisms even for large data sets. (Thomas Rölleke, a former member of our group, used HySpirit to found a startup company with the same name in 1999).
Besides multimedia systems, digital libaries (DLs) became an important new application area for IR methods (with new funding opportunities) in the mid-90s. Our group has been active in this area since 1995, and today, IR methods for DLs are the major focus of this group. Past and present work in this area centers around four major themes:
- Networked IR
- In the projects Medoc (1995-97), Interdoc (1998) and MIND (2001-02), the group worked on the development of new probabilistic models for resource selection and result fusion, addressed the issue of heterogeneity wrt. database schemas and retrieval methods, and extended these approaches for retrieving multimedia data. An alternative scheme was investigated in the CYCLADES project (2001-03), where metadata from Open Archives are gathered in a central server, which offers searching and browsing for content-oriented subsets of the records collected..The current Pepper (2003-6) project investigates the application of the various concepts in peer-to-peer networks.
- XML retrieval
- Starting with CARMEN project (1999-2001) and continued in the CLASSIX project,(2002-04), we are developing IR methods for XML documents. A major result is the development of the query language XIRQL and its implementation within the new retrieval engine HyREX. Current work in this area focuses on interactive retrieval and clustering of XML documents.
- User-oriented retrieval methods
- Based on the ideas of Bates et al., the DAFFODIL project (2000-04) develops a new frontend for federated digital libraries that supports high-level search activities in an adaptive and proactive way.
- Evaluation of DLs
- Within the DELOS Network of Excellence (2000-3), (2004-7), Norbert Fuhr leads the evaluation workpackage aiming at the development of evaluation methods and testbeds for digital libraries. Part of these activities is the INEX initiative for the evaluation of XML retrieval, which provides a testbed for fulltext retrieval of XML documents.