Classification and Intelligent Search on Information in XML

    Classification and Intelligent Search on Information in XML
    N. Fuhr
    G. Weikum
    IEEE Data Engineering Bulletin
CLASSIX, a joint project of the Universities of Dortmund and the Saarland in Germany, develops information retrieval methods for XML documents. In order to make a document's semantics explicit and more amenable for effective information searching, the following issues are addressed: 1. providing an easy-to-use yet powerful and efficient search language that combines concepts from current XML pattern-matching languages (e.g., XPath, XQuery, etc.) with ontology-backed information-retrieval-style search result ranking, 2. extracting more semantics from existing document collections by constructing structural and ontological skeletons (e.g., in the form of DTDs or XML schemas) that describe the data at a higher semantic level and can also facilitate new forms of indexing for efficiency, and 3. classifying existing documents according to a given thematic or personalized, hierarchical ontology to make searching more effective (e.g., exploit relevance feedback) and efficient (e.g., limit the search focus).

