
Resource Selection and Data Fusion for Multimedia International Digital Libraries
- Duration:
- From 01. 02. 2001 until 31. 12. 2003
- Contact Persons:
- Involved Persons:
- Sponsored by:
- EU FP5
- Reference number:
- IST-2000-26061, 0415053 (Dortmund), 15311571 (Duisburg)
- Participating Institutions:
This research addresses problems associated with the emergence of thousands of heterogeneous multimedia Digital libraries distributed internationally on multiple platforms. Users have problems with resource selection as they are unaware of the contents of each individual library in terms of quantity, quality, information type, provenance and likely relevance. Once a set of relevant libraries has been selected the user must organise and interpret the information in a common format and environment. Typically this is performed through visual evaluation and ad hoc integration which forces users to restrict their attention to a small subset of the information retrieved.
MIND will assist users to know where to search, how to query different media, and how to combine information from diverse sources.
The University of Dortmund is responsible for three subtasks:
Resource selection: Basis was the decision-theoretic framework [Fuhr:99b] (developed by Dortmund). Each database has assigned costs (covering retrieval quality, communication time, monetary costs). Given a query (containing the number of documents to retrieve), the task is to compute (for efficiency, this number should be zero mostly) for every database the number of documents to retrieve from that database. Of course, the sum should equal the user-specified number of documents to retrieve, and the overall costs should be minimised.
This model was extended in MIND [Nottelmann/Fuhr:03a] . Major achievements are:
- 2 new methods for estimating retrieval quality (simulated retrieval on a sample; assuming a normal distribution for the indexing weights)
- relationship between probability of inference (RSV) and probability of relevance by a logistic (instead of a linear) function [Nottelmann/Fuhr:03e]
- first evaluation, comparable quality compared to CORI, the state-of-the-art resource selection method
- extension towards different data and media types beside text [Nottelmann/Fuhr:03c]
- integration of CORI into the decision-theoretic framework
Heterogeneity: The existing databases differ in terms of content and structure (schema [Fuhr:99] ) of its documents (e.g., they can distinct "editor" and "author"). Thus, the user query (specified against a global schema) must be translated for every database into a query fitting the database schema.
This basic idea was extended and implemented within MIND [Nottelmann/Fuhr:03b] . Major achievements are:
- modelling MIND queries and documents in DAML+OIL
- defining uncertain schema mapping rules in Probabilistic Datalog
- Transforming rules in XSLT stylesheets
- Implementation of this approach
- a first approach for learning the uncertain logical rules from examples [Nottelmann/Fuhr:01]
Media type "facts": The project MIND covered four media types: text, images, facts (e.g. author names, numbers) and the transcripts of speech recognition. Dortmund was responsible for "facts".
In most areas, handling facts is the same as handling "ordinary" text. Significant differences are in the resource selection part. Thus, we extended our decision-theoretic framework so that it can also estimates costs for several factual datatypes [Nottelmann/Fuhr:03c] .
You can find the MIND publications of our group below. The official MIND web site also contains the publications of all project partners.
Publications
- J. Callan; F. Crestani; H. Nottelmann; P. Pala; X. M. Shou (2003).
- Resource Selection and Data Fusion in Multimedia Distributed Digital Libraries (poster). In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
- H. Nottelmann; N. Fuhr (2003).
- From uncertain inference to probability of relevance for advanced IR applications. In 25th European Conference on Information Retrieval Research (ECIR 2003)
- H. Nottelmann; N. Fuhr (2003).
- Evaluating different methods of estimating retrieval quality for resource selection. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
- H. Nottelmann; N. Fuhr (2003).
- Combining DAML+OIL, XSLT and probabilistic logics for uncertain schema mappings in MIND. In Research and Advanced Technology for Digital Libraries. Proc. European Conference on Digital Libraries (ECDL 2003)
- H. Nottelmann; N. Fuhr (2003).
- Decision-theoretic resource selection for different data types in MIND. In Recent research in multimedia distributed information retrieval. Proceedings of the ACM SIGIR 2003 Workshop on Distributed Information Retrieval, Toronto, Canada. (Lecture Notes in Computer Science, 2924).
- H. Nottelmann; N. Fuhr (2003).
- The MIND Architecture for Heterogeneous Multimedia Federated Digital Libraries. In Recent research in multimedia distributed information retrieval. Proceedings of the ACM SIGIR 2003 Workshop on Distributed Information Retrieval, Toronto, Canada. (Lecture Notes in Computer Science, 2924).
- H. Nottelmann; N. Fuhr (2003).
- From Retrieval Status Values to Probabilities of Relevance for Advanced IR Applications. Information Retrieval 6(4)
- H. Nottelmann; P. Pala (2003).
- MIND: A Graphical User Interface for Presenting Fused Results from Multi-Media Distributed Digital Libraries (poster). In Research and Advanced Technology for Digital Libraries. Proc. European Conference on Digital Libraries (ECDL 2003)
- N. Fuhr; C.-P. Klas (2001).
- Combining RDF and Agent-Based Architectures for Semantic Interoperability in Digital Libraries. In Proceedings of the DELOS-Workshop on Interoperability in Digital Libraries
- H. Nottelmann; N. Fuhr (2001).
- Learning probabilistic Datalog rules for information classification and transformation. In Proceedings of the 10th International Conference on Information and Knowledge Management
- H. Nottelmann; N. Fuhr (2001).
- MIND: An architecture for multimedia information retrieval in federated digital libraries. In Proceedings of the DELOS-Workshop on Interoperability in Digital Libraries
Talks
- Norbert Fuhr (2003).
- Multimedia Information Retrieval in Networked Digital Libraries. Talk at the Perspectives Seminar ``Multimedia Retrieval'', Dagstuhl
- Henrik Nottelmann (2003).
- Probabilistic logics for defining and using P2P service descriptions. QMIR Seminar, London
Diploma, Master and Bachelor theses
Only in german!
- Semiautomatisches Pflegen von Wrappern
- Finished diploma thesis
- Lernen unsicherer Regeln in HySpirit
- Finished diploma thesis
Related projects
- DAFFODIL
- Distributed Agents for User-Friendly Access of Digital Libraries
- Pepper
- Peer-to-Peer Architectures for Federated Search of Complex Digital Libraries


