Index Compression vs. Retrieval Time of Inverted Files for XML Documents

  • Citation-Key:
    Fuhr/Goevert:02
  • Title:
    Index Compression vs. Retrieval Time of Inverted Files for XML Documents
  • Author(s):
    Norbert Fuhr
    Norbert Gövert
  • In:
    • Citation-Key:
      CIKM:02
    • Title:
      Proceedings of the 11th International Conference on Information and Knowledge Management
    • Editor(s):
      Charles Nicholas
      David Grossman
      Konstantinos Kalpakis
      Sajda Qureshi
      Han van Dissel
      Len Seligman
    • Publisher:
      ACM
    • In:
      Proceedings of the 11th International Conference on Information and Knowledge Management
    • Year:
      2002
  • Year:
    2002
  • Note:
    Poster

Abstract:


Query languages for retrieval of XML documents allow for conditions referring both to the content and the structure of documents. In this paper, we investigate two different approaches for reducing index space of inverted files for XML documents. First, we consider methods for compressing index entries. Second, we develop the new XS tree data structure which contains the structural description of a document in a rather compact form, such that these descriptions can be kept in main memory. Experimental results on two large XML document collections show that very high compression rates for indexes can be achieved, but any compression increases retrieval time. On the other hand, highly compressed indexes may be feasible for applications where storage is limited, such as in PDAs or E-book devices.

Fulltext as PDF