Citation-Key:
Doucet/Ahonen-Myka:03
Title:
Naive clustering of a large XML document collection
Author(s):
A. Doucet
H. Ahonen-Myka
In:
Citation-Key:
INEX:03
Title:
INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the First INEX Workshop. Dagstuhl, Germany, December 8--11, 2002
Editor(s):
Norbert Fuhr
Norbert Gövert
Gabriella Kazai
Mounia Lalmas
Publisher:
ERCIM
In:
INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the First INEX Workshop. Dagstuhl, Germany, December 8--11, 2002
Year:
2003

BibTeX entry

Fulltext

Page(s):
81--87
Year:
2002

Abstract:
In this paper, we address the problem of clustering a homogeneous collection of text-centric XML documents. We present some experiments we have led on clustering the INEX structured document collection. Our claim is that element tags provide additional information that must help improve the quality of clustering. We have implemented and experimented various ways to account for document structure, and used the well-known k-means algorithm to validate these principles.

BibTeX entry

Fulltext