2 ways how machine learning helps organize the online shopping experience!
All shoppers want an overview of the products available for purchase in order to make an informed decision. Unfortunately the ever-increasing flood of offers available online makes this difficult. Comparison shopping platforms leverage powerful analytical tools to help solve this problem automatically. Read about two approaches we are developing at JSI within the scope of the EW-Shopp project to help organize huge sets of product information!
Categorization and duplicate detection – main problems of structuring product data
When collecting offers from different online stores, the format and detail of the information is virtually never the same. To provide an effective comparison platform, two main problems must be solved:
- The products must be categorized into the same category system and
- Any duplicate offers of the same product from different stores must be collected together.
There is little to work with when solving this problem. Typically, just a short paragraph description is provided along with the product name. These were traditionally slow and labour-intensive tasks which people had to perform manually. Modern text mining approaches can help us solve the better!
95% of products can be correctly categorized automatically
By observing the words and short phrases that occur in product descriptions, we can determine which are common in some categories and which not. For example, the terms “RAM memory” and “keyboard” are much more likely to be used in the Computers category than in Beauty & Health. By taking into account over 300,000 such typical words and phrases we built a model which is able to automatically classify products into categories. The model correctly classifies 95% of the products into top-level categories and 90% into their sub-categories.
We can automatically detect 45% of all product duplicates
Using a similar approach of breaking them down into words and phrases, we can also compare the products between themselves directly. Descriptions of the same product from different stores are bound to use similar language, as they are describing the very same characteristics. We can utilize similarity metrics designed to measure how well texts match on their content to identify which are likely about the same product. Current results show this approach can capture 45% of all duplicates in the data.
Analytics help both shoppers and retailers
The promising results show how analytics can help organize products automatically. This helps shoppers gain a clear overview of their choices to select the best offer available. Retailers such as Big Bang benefit by having their products represented in the proper section of the market and not lost in the clutter. Finally comparison shopping platforms such as Ceneje.si can reduce their operational costs by automating these tasks. Leveraging such analytic methods is key to remain competitive in ecommerce. At JSI and EW-Shopp, we are developing innovative approaches that improve the online shopping experience even further.