Page 1 of 1

Step 2: Preparing data for model training

Posted: Wed Jan 22, 2025 4:45 am
by Maksudasm
Our recommendation system will be based on the classification task, determining whether a particular product belongs to the appropriate group or not. Training the model will require specifying two types of data obtained from the history of baskets. A positive entry implies an arbitrary product removed from the basket, and a negative entry implies one that has never been included in it. Thanks to this approach, it is possible to distinguish between products that should or should not be included in the basket. Next, each entry needs to be filled with context: description of outlets, time of purchase, external properties. The code for these actions in the process of forming a dataset for training the model is contained in the above-mentioned notebook.

DataSphere uses the standard Jupyter Notebook interface. GlowByte. Typically, different notebooks can be used for data preparation and training; in the demo device, these actions are performed together. When creating a recommender system, the user can work with any of the specified options. This instruction uses ready-made notebook files.

Next, we need to form vector overseas chinese in australia data representations for each product - a set of characteristics corresponding to how the product is presented in baskets. To do this, we use the SVD decomposition method, which is widely used in various areas of machine learning. We will create a matrix in which the row is the basket, the column is the product included in it. Using SVD decomposition, the matrix is ​​decomposed into three elements. To do this, the cosine measure is calculated between the vector representation of a given product and the average value of the basket vector.

Preparing data for model training

This will allow you to determine the degree of similarity between a given product and products from a specific basket.

To generate personalized recommendations for users, we apply calculated RFM aggregates for each product: recency (how long ago the product was purchased), frequency (frequency of purchase), monetary (expenses made by each user on the corresponding product). The calculated aggregates act as predictors of the forecast model.

At the output we get a dataset that will be used to solve the problem.

Step 3: Train the model
Once the data has been prepared, training methods need to be selected. Upon completion, the model will be able to rank products based on their possible relevance to the basket. Based on the generated ranks and business metrics, recommendations are made. GlowByte experts have found that the most suitable in this situation is the gradient boosting principle, which is widely used to create classifications and regressions. All kinds of neural networks are also suitable if the data used contains dependencies that can be successfully applied in them for deep learning.

Next, we launch training. Once it is complete, we can evaluate the significance of the selected predictors.

The next step is to evaluate the quality of the generated models. GlowByte experts claim that it is more convenient to perform this in two steps. The first is to evaluate the accuracy of predicting the relevance of the product to the basket. The second is to test the result of the practical use of the algorithm. The first step in the situation under consideration is performed based on historical data using metrics for evaluating the classification met