a price anomaly detection model for marketplace e-commerce company
- Miss-priced items in marketplace platform
- Loss of customer trust (PR issue)
- The spread of expensive prices on social media
- Overpriced products by sellers
- Adding zeros to the selling prices mistakably
- Products without historical data (new) -> hierarchy data
- Products with historical data -> time series data
- Creating a data pipeline for extracting the following data from database and inserting to HDFS (partitioned parquet): Product info: product_id, variant_id, product_title, brand, category_tree Price: Selling_price, RRP price (actual price withpout discount)
- Select data from HDFS with Clickhouse view and Filter out product prices with more than 30 days and filter out products which have printed price
- Apply RegEx on product title to extract number of items in each variant_id
- Calculate logarithm of RRP price in order to normalize the price distribution and make the model insensitive on higher prices
- Clustering all products within each leaf category in order to differentiate between normal and luxury products
- Calculate 4 statistical feature
- Calculate maximum approved price per leaf category (based on pricing agents’ feedback)
- Calculate outlier Possibility feature
- Share products with high possibilities in a PowerBI Dashboard.
- Rule based anomaly detection for historical products
- Number of items in each DKPC
- Luxury products exclusion problem
- Memory error while running DBSCAN for finding anomalies
- Storage and data processing problems