Hyper-media Retrieval Engine for XML

Beschreibung


XML is the emerging standard for representing knowledge in almost arbitrary applications. At least almost every kind of knowledge can be represented in XML. For exploring such knowledge, one needs a search engine which is able to let users benefit from all of the concepts with which XML blesses the world.

HyREX is the Hyper-media Retrieval Engine for XML [Abolhassani/etal:02] . The HyREX project is an ongoing effort (funded as part of other projects like e. g. CARMEN, CYCLADES, and CLASSIX) for developing an information retrieval engine for XML documents. HyREX's main characteristics can be derived from the constituents of its name:

hyper
HyREX offers explicit and implicit links to the user. Explicit links are specified within the documents, usually by means of XML linking standards, such as XLink and XPointer. Implicit links are intrinsic to information structures which \hyrex derives from XML document collections.
media
HyREX offers search facilities for text, but also for other media than text, at least conceptually.
retrieval engine
HyREX allows users to explore all kinds of information structures available through XML; besides retrieval in XML documents it allows for browsing and searching the domains of attributes of XML documents as well as schema information given for example by the DTD of a document collection.
XML
HyREX allows retrieval under consideration of content and structure inherent in XML documents.

Architecture

HyREX's architecture is similar to that of database management systems. Thus, there is a clear separation between the logical and the physical level. The physical layer HyPath deals with efficient access paths for retrieval, while the logical layer deals with the XIRQL query language. On top of these layers is HyGate, the user interface to HyREX applications.

In the following we give a brief outline on the characteristics of the levels.

Architecture
HyGate
  • User interface for searching and browsing
  • Query formulation assistant
  • Presentation of retrieval results
XIRQL
  • XML Information Retrieval Query Language
  • extends XPath with IR capabilities
  • Weighted document content and query conditions
  • Ranking for search results
  • Powerful searching for any type of information
  • Relevance-oriented search
HyPath
  • Efficient access paths for content and structure
  • application specific selection of access paths

Publikationen


Vorträge


  • Mohammad Abolhassani; Norbert Fuhr; Saadia Malik (2003).
    HyREX at INEX 2003. Talk at the INEX Workshop, Dagstuhl

Verwandte Projekte


  • CARMEN AP 7
    Content Analysis, Retrieval and Metadata: Effective Networking
    Arbeitspaket 7: A Document Referencing and Linking System
  • CLASSIX
    Classification and Intelligent Search on Information in XML
  • FOCUS
    Focussed retrieval of structured documents
  • INEX
    Initiative for the Evaluation of XML retrieval