Dagstuhl seminar (2011)
Challenges in Document Mining
Organizers: Hamish Cunningham, Oren Etzioni, Norbert Fuhr, Benno Stein
Semantic Cluster Analysis in Information Retrieval
- Duration:
- From 01. 07. 2009 until 31. 03. 2012
- Contact Persons:
- Involved Persons:
- Sponsored by:
- DFG
- Reference number:
-
- DFG: FU 205/22-1
- UDE: ka00043i
- Participating Institutions:
Clustering methods combine an object model, a similarity metrics and a fusion principle, where the latter is the focus of current research.
For more advanced problems, clustering can only be successful when the three elements are combined in a meaningful way and knowledge about both the analysis task and the user is considered. This principle of 'semantic clustering' will allow for solving clustering problems in IR in a more efficient and effective way than current methods.
This project aims at investigating the theoretical, methodological and experimental aspect of this problem. Hereby 'semantics' will play multiple roles:
- in the form of specialized retrieval models which consider knowledge about the IR task at hand,
- by integrating domain knowledge
- as ensemble clustering, i.e. combining fusion methods,
- from the user when performing interactive or multi-clustering.
Finally, semantics will form the basis for cluster labeling - which currently forms the biggest challenge in document clustering.
Additional information:
- Project website of the working group MediaSystems of the Bauhaus-Universität Weimar
Events
Publications
- Norbert Fuhr; Marc Lechtenfeld; Benno Stein; Tim Gollub (2012).
- The Optimum Clustering Framework: Implementing the Cluster Hypothesis. Information Retrieval 15
- Marc Lechtenfeld; Norbert Fuhr (2012).
- Result Clustering Supports Users with Vague Information Needs. In: Proceedings of the 12th Dutch-Belgian Information Retrieval Workshop 2012, Ghent, Belgium
- Odysseas Papapetrou; Wolf Siberski; Norbert Fuhr (2012).
- Decentralized Probabilistic Text Clustering. IEEE Transactions on Knowledge and Data Engineering 24(10)
- Hamish Cunningham; Norbert Fuhr; Benno Stein (2011).
- Challenges in Document Mining (Dagstuhl Seminar 11171). Dagstuhl Reports 1(4)
- Marco Janc (2011).
- Effizienteres Dokumenten-Clustering durch Grafikprozessoren. Bachelorarbeit
- Marc Lechtenfeld (2010).
- Benutzerorientiertes Dokumenten-Clustering durch die Verwendung einer Anfragemenge. In: Proceedings of the ``Information Retrieval 2010'' Workshop at LWA 2010, Kassel, Germany
- Odysseas Papapetrou; Wolf Siberski; Norbert Fuhr (2010).
- Text Clustering for Peer-to-Peer Networks with Probabilistic Guarantees. In: 32nd European Conference on Information Retrieval Research (ECIR 2010)
Talks
- Marc Lechtenfeld; Norbert Fuhr (2012).
- Result Clustering Supports Users with Vague Information Needs. Talk at the 12th Dutch-Belgian Information Retrieval Workshop 2012, Ghent, Belgium
- Norbert Fuhr (2011).
- A Framework for Optimum Document Clustering: Implementing the Cluster Hypothesis. Invited talk at Yandex, Moscow, Russia
- Norbert Fuhr (2011).
- A Framework for Optimum Document Clustering: Implementing the Cluster Hypothesis. Talk at Dagstuhl seminar 'Challenges in Document Mining'
- Marc Lechtenfeld (2010).
- Benutzerorientiertes Dokumenten-Clustering durch die Verwendung einer Anfragemenge. Poster at the ``Information Retrieval 2010'' Workshop at LWA 2010, Kassel, Germany
Diploma, Master and Bachelor theses
Only in german!
- Effizienteres Dokumenten-Clustering durch Grafikprozessoren
- Finished bachelor thesis
- Effizienteres Dokumenten-Clustering durch Cloud Computing
- Finished diploma thesis
- Extraktion aspektbezogener Information aus Buchrezensionen
- Finished diploma thesis
Related projects
- ezDL
- ezDL is framework for interactive search systems
