Topic Modelling of Product Reviews with respect to Vagueness


Reserved bachelor thesis


Targeted audience
  • ISE Bachelor
  • Good programming skills (e.g. Python)
  • Understanding of unsupervised ML concepts (desired)
  • Lecture Internet-Suchmaschinen and/or Information Retrieval (desired).

Task description

Online-Shops such as Amazon provide several reviews from users who experienced certain products. Reviews describe different problems about products. For example in the laptop domain users may emphasize the performance of that product, but complain about the price. Reviews discussing one problem can be grouped together in one bin. One problem can be regarded as one topic or aspect of the product.

Another point is that some topics may be expressed by different terms, i.e. single words or phrases. Some terms can have a higher degree of vagueness which are not precise and thus allow multiple interpretations. For instance, when users report about brand reputation or product quality, the usage of such vague phrases is potentially higher.

The goal of this thesis is to apply topical clustering methods on a existing reviews dataset in order to find out the common problems discussed in our use case domain Laptop. Furthermore, we want to investigate the relationship of pre-defined vague expressions and the determined topics. This helps us in finding out which topics are described more vague or crisp.

This thesis includes the following phases:

  • Application of soft and topical clustering methods on a product reviews dataset.
  • Evaluation of the determined topics.
  • Association between vagueness and topics