Zitationsschlüssel:
Doucet/Ahonen-Myka:03
Titel:
Naive clustering of a large XML document collection
Autor(en):
A. Doucet
H. Ahonen-Myka
In:
Zitationsschlüssel:
INEX:03
Titel:
INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the First INEX Workshop. Dagstuhl, Germany, December 8--11, 2002
Herausgeber:
Norbert Fuhr
Norbert Gövert
Gabriella Kazai
Mounia Lalmas
Verlag:
ERCIM
In:
INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the First INEX Workshop. Dagstuhl, Germany, December 8--11, 2002
Jahr:
2003

BibTeX-Eintrag

Volltext

Seite(n):
81--87
Jahr:
2002

Zusammenfassung:
In this paper, we address the problem of clustering a homogeneous collection of text-centric XML documents. We present some experiments we have led on clustering the INEX structured document collection. Our claim is that element tags provide additional information that must help improve the quality of clustering. We have implemented and experimented various ways to account for document structure, and used the well-known k-means algorithm to validate these principles.

BibTeX-Eintrag

Volltext