Pretend you own an online retail website. When your customers shop at your retail website, you want to present them with personalized product recommendations to expose a variety of other products your business offers. Over the span of your website’s existence, you have continuously gathered customer data and want to somehow use this data towards generating personalized product recommendations.
Adobe Experience Platform Data Science Workspace provides the means to achieve your goal using the prebuilt Product Recommendations Recipe. Follow this tutorial to see how you can access and understand your retail data, create and optimise a machine learning Model, and generate insights in Data Science Workspace.
This tutorial reflects the workflow of Data Science Workspace, and covers the following steps for creating a machine learning Model:
Before starting this tutorial, you must have the following prerequisites:
Access to Adobe Experience Platform. If you do not have access to an IMS Organization in Experience Platform, please speak to your system administrator before proceeding.
Enablement assets. Please reach out to your account representative to have the following items provisioned for you.
Download the three required Jupyter Notebook files from the Adobe public Git repository, these will be used to demonstrate the JupyterLab workflow in Data Science Workspace.
A working understanding of the following key concepts used in this tutorial:
To create a machine learning Model that makes personalized product recommendations to your customers, previous customer purchases on your website must be analyzed. This section explores how this data is ingested into Platform through Adobe Analytics, and how that data is transformed into a Feature dataset to be used by your machine learning Model.
The other datasets have been pre-populated with batches for previewing purposes. You can view these datasets by repeating the above steps.
Dataset name | Schema | Description |
---|---|---|
Golden Data Set postValues | Golden Data Set schema | Analytics source data from your website |
Recommendations Input Dataset | Recommendations Input Schema | The Analytics data is transformed into a training dataset using a feature pipeline. This data is used to train the Product Recommendations machine learning Model. itemid and userid correspond to a product purchased by that customer. |
Recommendations Output Dataset | Recommendations Output Schema | The dataset for which scoring results are stored, it will contain the list of recommended products for each customer. |
The second component of the Data Science Workspace lifecycle involves authoring Recipes and Models. The Product Recommendations Recipe is designed to generate product recommendations at scale by utilizing past purchase data and machine learning.
Recipes are the basis for a Model as they contain machine learning algorithms and logic designed to solve specific problems. More importantly, Recipes empower you to democratize machine learning across your organization, enabling other users to access a Model for disparate use cases without writing any code.
You have now reviewed the input and output schemas required by the Product Recommendations Recipe. You can now continue to the next section to find out how to create, train, and evaluate a Product Recommendations Model.
Now that your data is prepared and the Recipe is ready to be used, you can create, train, and evaluate your machine learning Model.
A Model is an instance of a Recipe, enabling you to train and score with data at scale.
You can choose to wait for the training run to finish, or continue to create a new training run in the following section.
On the Model Overview page, click Train near the top right to create a new training run. Select the same input dataset you used when creating the Model and click Next.
The Configuration page appears. Here you can configure the training run’s “num_recommendations” value, also known as a Hyperparameter. A trained and optimized Model will utilize the best-performing Hyperparameters based on the results of the training run.
Hyperparameters cannot be learned, therefore they must be assigned before training runs occur. Adjusting Hyperparameters may change the accuracy of the Trained Model. Since optimizing a Model is an iterative process, multiple training runs may be required before a satisfactory evaluation is achieved.
Set num_recommendations to 10.
An additional data point will appear on the Model evaluation chart once the new training run completes, this may take up to several minutes.
Each time a training run completes, you can view the resulting evaluation metrics to determine how well the Model performed.
The final step in the Data Science workflow is to operationalize your model in order to score and consume insights from your data store.
Once the scoring run has successfully completed, you will be able to preview the results and view the insights generated.
Well done, you have successfully generated product recommendations!
This tutorial introduced you to the workflow of Data Science Workspace, demonstrating how raw unprocessed data can be turned into useful information through machine learning. To learn more about using the Data Science Workspace, continue to the next guide on creating the retail sales schema and dataset.