POLAR
Probabilistic Object-oriented Logics for Annotation-based Retrieval
Description
POLAR is a framework for annotation-based document and annotation retrieval, and discussion search.
Annotations gain a growing importance in today's information systems to establish communicative and collaborative functions. Annotation-based discussion can be part of a larger work process in digital libraries as well as in Webbased systems like Wikipedia and newswire sites like ZDNet News, where people discuss the published articles and their content. Furthermore, semantic annotation and tagging (as we can find it, e.g. in last.fm) gains rising popularity.
Annotations can basically be categorised into meta-level annotations, containing assertions about the annotated document, and content-level annotations, where the content of a document is extended by the annotation. It is a straightforward step to use annotations and annotation-based discussions as a valuable information source for document, annotation and discussion search. While classical retrieval tools enable us to search for documents as an atomic unit without any context, frameworks like POOL are able to model and exploit the document structure and nested documents (similar to current XML IR methods). But since annotation hypertexts are not necessarily trees (as structured documents in POOL), the POLAR framework is able to consider the special nature of annotations for different retrieval strategies. POLAR thus cannot only cope with structured documents like POOL, but also with several kinds of annotations.
POLAR thus offers the following features for annotation-based retrieval:
- Indexing and modeling of annotation hypertexts, comprising
- Object-modeling based on probabilistic propositions (e.g. index terms, attributes and categorisations)
- Structured documents and annotations
- Content and meta annotations
- References
- Merged annotation targets
- Annotated passages (fragments)
- Annotation types
- Polarity of annotations
- Annotation-based document and discussion search
- Database queries
- Content-oriented queries using knowledge and relevance augmentation and probabilistic retrieval
- In later versions, POLAR is supposed to support semantic annotation and retrieval
Some experiments on annotation-based document search have been performed recently. The basic collection is a snapshot of ZDNet News containing roughly 4700 articles and >91000 user comments. We created a test set containing 20 topics and relevance judgements for 17 of them.
- Topics; the topics were created for the INEX Heterogeneous Track, where ZDNet news was a subcollection. See the report on the INEX 2006 Het Track for further details on topic creation.
- Relevance judgements; note that only 17 topics were judged (see statistics file in the archive)
- To obtain the ZDNet News collection, please contact Ingo Frommholz
Publications
- Ingo Frommholz (2008).
- A Probabilistic Framework for Information Modelling and Retrieval Based on User Annotations on Digital Objects. PhD thesis
- Ingo Frommholz; Marc Lechtenfeld (2008).
- Determining the Polarity of Postings for Discussion Search. In: Proceedings of the ``Information Retrieval 2008'' Workshop at LWA 2008, Würzburg, Germany
- Ingo Frommholz (2007).
- Annotation-based Document Retrieval with Probabilistic Logics. In Research and Advanced Technology for Digital Libraries. Proc. of the 11th European Conference on Digital Libraries (ECDL 2007)
- Ingo Frommholz; Norbert Fuhr (2006).
- Probabilistic, Object-oriented Logics for Annotation-based Retrieval in Digital Libraries. In Opening Information Horizons -- Proc. of the 6th ACM/IEEE Joint Conference on Digtial Libraries (JCDL 2006)
- Ingo Frommholz; Norbert Fuhr (2006).
- Evaluation of Relevance and Knowledge Augmentation in Discussion Search. In Research and Advanced Technology for Digital Libraries. Proc. of the 10th European Conference on Digital Libraries (ECDL 2006)
- Ingo Frommholz (2005).
- Applying the Annotation View on Messages for Discussion Search. In The Fourteenth Text REtrieval Conference (TREC 2005)
- Ingo Frommholz (2005).
- What did the Others Say? Probabilistic Indexing and Retrieval Models in Annotation-based Discussions. Bulletin of the IEEE Technical Committee on Digital Libraries
- Maristella Agosti; Nicola Ferro; Ingo Frommholz; Ulrich Thiel (2004).
- Annotations in Digital Libraries and Collaboratories -- Facets, Models and Usage. In Research and Advanced Technology for Digital Libraries. Proc. European Conference on Digital Libraries (ECDL 2004)
- Ingo Frommholz; Ulrich Thiel; Thomas Kamps (2004).
- Annotation-based Document Retrieval with Four-Valued Probabilistic Datalog. In: Thomas R\"olleke; Arjen P. de Vries (eds.): Proceedings of the first SIGIR Workshop on the Integration of Information Retrieval and Databases (WIRD'04). , Sheffield, UK.
Talks
- Ingo Frommholz; Marc Lechtenfeld (2008).
- Determining the Polarity of Postings for Discussion Search. Talk at the FGIR track at LWA 2008, Würzburg, Germany
Software
- A first POLAR prototype, supporting a large part of the above functionality, has been developed and used for experiments on annotations-based document retrieval and discussion search. The POLAR implementation is based on the JaySpirit Java API for HySpirit. Details and software packages follow.