Resource Selection and Data Fusion for Multimedia International Digital Libraries


Duration:
From 01. 02. 2001 until 31. 12. 2003
Contact Persons:
Involved Persons:
Sponsored by:
  • EU FP5
Reference number:
IST-2000-26061, 0415053 (Dortmund), 15311571 (Duisburg)
Participating Institutions:

This research addresses problems associated with the emergence of thousands of heterogeneous multimedia Digital libraries distributed internationally on multiple platforms. Users have problems with resource selection as they are unaware of the contents of each individual library in terms of quantity, quality, information type, provenance and likely relevance. Once a set of relevant libraries has been selected the user must organise and interpret the information in a common format and environment. Typically this is performed through visual evaluation and ad hoc integration which forces users to restrict their attention to a small subset of the information retrieved.

MIND will assist users to know where to search, how to query different media, and how to combine information from diverse sources.

The University of Dortmund is responsible for three subtasks:

  1. Resource selection: Basis was the decision-theoretic framework [Fuhr:99b] (developed by Dortmund). Each database has assigned costs (covering retrieval quality, communication time, monetary costs). Given a query (containing the number of documents to retrieve), the task is to compute (for efficiency, this number should be zero mostly) for every database the number of documents to retrieve from that database. Of course, the sum should equal the user-specified number of documents to retrieve, and the overall costs should be minimised.

    This model was extended in MIND [Nottelmann/Fuhr:03a] . Major achievements are:

    • 2 new methods for estimating retrieval quality (simulated retrieval on a sample; assuming a normal distribution for the indexing weights)
    • relationship between probability of inference (RSV) and probability of relevance by a logistic (instead of a linear) function [Nottelmann/Fuhr:03e]
    • first evaluation, comparable quality compared to CORI, the state-of-the-art resource selection method
    • extension towards different data and media types beside text [Nottelmann/Fuhr:03c]
    • integration of CORI into the decision-theoretic framework
  2. Heterogeneity: The existing databases differ in terms of content and structure (schema [Fuhr:99] ) of its documents (e.g., they can distinct "editor" and "author"). Thus, the user query (specified against a global schema) must be translated for every database into a query fitting the database schema.

    This basic idea was extended and implemented within MIND [Nottelmann/Fuhr:03b] . Major achievements are:

    • modelling MIND queries and documents in DAML+OIL
    • defining uncertain schema mapping rules in Probabilistic Datalog
    • Transforming rules in XSLT stylesheets
    • Implementation of this approach
    • a first approach for learning the uncertain logical rules from examples [Nottelmann/Fuhr:01]
  3. Media type "facts": The project MIND covered four media types: text, images, facts (e.g. author names, numbers) and the transcripts of speech recognition. Dortmund was responsible for "facts".

    In most areas, handling facts is the same as handling "ordinary" text. Significant differences are in the resource selection part. Thus, we extended our decision-theoretic framework so that it can also estimates costs for several factual datatypes [Nottelmann/Fuhr:03c] .

You can find the MIND publications of our group below. The official MIND web site also contains the publications of all project partners.


Publications



Talks


Diploma, Master and Bachelor theses

Only in german!



Related projects


DAFFODIL
Distributed Agents for User-Friendly Access of Digital Libraries
Pepper
Peer-to-Peer Architectures for Federated Search of Complex Digital Libraries

Notes


Our deliverables