Hyper-media Retrieval Engine for XML


Contact Persons:
Involved Persons:

XML is the emerging standard for representing knowledge in almost arbitrary applications. At least almost every kind of knowledge can be represented in XML. For exploring such knowledge, one needs a search engine which is able to let users benefit from all of the concepts with which XML blesses the world.

HyREX is the Hyper-media Retrieval Engine for XML [Abolhassani/etal:02] . The HyREX project is an ongoing effort (funded as part of other projects like e. g. CARMEN, CYCLADES, and CLASSIX) for developing an information retrieval engine for XML documents. HyREX's main characteristics can be derived from the constituents of its name:

hyper
HyREX offers explicit and implicit links to the user. Explicit links are specified within the documents, usually by means of XML linking standards, such as XLink and XPointer. Implicit links are intrinsic to information structures which \hyrex derives from XML document collections.
media
HyREX offers search facilities for text, but also for other media than text, at least conceptually.
retrieval engine
HyREX allows users to explore all kinds of information structures available through XML; besides retrieval in XML documents it allows for browsing and searching the domains of attributes of XML documents as well as schema information given for example by the DTD of a document collection.
XML
HyREX allows retrieval under consideration of content and structure inherent in XML documents.

Architecture

HyREX's architecture is similar to that of database management systems. Thus, there is a clear separation between the logical and the physical level. The physical layer HyPath deals with efficient access paths for retrieval, while the logical layer deals with the XIRQL query language. On top of these layers is HyGate, the user interface to HyREX applications.

In the following we give a brief outline on the characteristics of the levels.

Architecture
HyGate
  • User interface for searching and browsing
  • Query formulation assistant
  • Presentation of retrieval results
XIRQL
  • XML Information Retrieval Query Language
  • extends XPath with IR capabilities
  • Weighted document content and query conditions
  • Ranking for search results
  • Powerful searching for any type of information
  • Relevance-oriented search
HyPath
  • Efficient access paths for content and structure
  • application specific selection of access paths

Publications


Mohammad Abolhassani; Norbert Fuhr; Saadia Malik (2004).
HyREX at INEX 2003. In INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the Second INEX Workshop. Dagstuhl, Germany, December 15--17, 2003
Norbert Gövert; Norbert Fuhr; Mohammad Abolhassani; Kai Großjohann (2003).
Content-oriented XML retrieval with HyREX. In INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the First INEX Workshop. Dagstuhl, Germany, December 8--11, 2002
Mohammad Abolhassani; Norbert Fuhr; Norbert Gövert; Kai Großjohann (2002).
HyREX: Hypermedia Retrieval Engine for XML. Research Report , University of Dortmund, Department of Computer Science, Dortmund, Germany
K. Großjohann (2001).
Physical Algebra. Research Report , University of Dortmund
Norbert Fuhr (2002).
HyREX: A Hyper-media Retrieval Engine for XML. Talk at the workshop "Carmen--Next Steps" in Osnabrück, January 16-18 (in German).
[ PDF | PPT ]
Norbert Gövert, Kai Großjohann (2002):
HyREX Manual
Norbert Fuhr (2002):
XIRQL: A Query Language for Information Retrieval in XML documents Talk at the University of Freiburg, July 2002 (partly in German).
[ PPT ]
Norbert Fuhr (2002):
XIRQL: Eine Anfragesprache für Information Retrieval in XML-Dokumenten Vortrag an der Humboldt-Universität Berlin 2002
[ PPT PDF ]

Talks


Mohammad Abolhassani; Norbert Fuhr; Saadia Malik (2003).
HyREX at INEX 2003. Talk at the INEX Workshop, Dagstuhl

Diploma, Master and Bachelor theses

Only in german!



Related projects


CARMEN WP 7
Content Analysis, Retrieval and Metadata: Effective Networking
Work Package 7: A Document Referencing and Linking System
CLASSIX
Classification and Intelligent Search on Information in XML
FOCUS
Focussed retrieval of structured documents
INEX
Initiative for the Evaluation of XML retrieval

Software



Testbeds


Test collections and prototype applications.