Categorizing Web Documents in Hierarchical Catalogues
- Categorizing Web Documents in Hierarchical Catalogues
- Ingo Frommholz
- 23th European Conference on Information Retrieval Research (ECIR 2001)
Automatic categorization of web documents (e.g. HTML documents) denotes the task of automatically finding relevant categories for a (new) document which is to be inserted into a web catalogue like Yahoo!.Thereexist many approaches for performing this difficult task. Here, special kinds of web catalogues, those whose category scheme is hierarchically ordered, are regarded. A method for using the knowledge about the hierarchy to gain better categorization results is discussed. This method can be applied in a post-processing step and therefore be combined with other known (non-hierarchical) categorization approaches.
Fulltext as PDF