Create and publish a machine learning model

Pretend you own an online retail website. When your customers shop at your retail website, you want to present them with personalized product recommendations to expose a variety of other products your business offers. Over the span of your website’s existence, you have continuously gathered customer data and want to somehow use this data towards generating personalized product recommendations.

Adobe Experience Platform Data Science Workspace provides the means to achieve your goal using the prebuilt Product Recommendations Recipe. Follow this tutorial to see how you can access and understand your retail data, create and optimize a machine learning Model, and generate insights in Data Science Workspace.

This tutorial reflects the workflow of Data Science Workspace, and covers the following steps for creating a machine learning Model:

  1. Prepare your data
  2. Author your Model
  3. Train and evaluate your Model
  4. Operationalize your Model

Getting started

Before starting this tutorial, you must have the following prerequisites:

  • Access to Adobe Experience Platform. If you do not have access to an IMS Organization in Experience Platform, please speak to your system administrator before proceeding.

  • Enablement assets. Please reach out to your account representative to have the following items provisioned for you.

    • Recommendations Recipe
    • Recommendations Input Dataset
    • Recommendations Input Schema
    • Recommendations Output Dataset
    • Recommendations Output Schema
    • Golden Data Set postValues
    • Golden Data Set Schema
  • Download the three required Jupyter Notebook files from the Adobe public Git repository, these will be used to demonstrate the JupyterLab workflow in Data Science Workspace.

A working understanding of the following key concepts used in this tutorial:

  • Experience Data Model: The standardization effort led by Adobe to define standard schemas such as Profile and ExperienceEvent, for Customer Experience Management.
  • Datasets: A storage and management construct for actual data. A physical instantiated instance of an XDM Schema.
  • Batches: Datasets are made up of batches. A batch is a set of data collected over a period of time and processed together as a single unit.
  • JupyterLab: JupyterLab is an open-source web-based interface for Project Jupyter and is tightly integrated into Experience Platform.

Prepare your data

To create a machine learning Model that makes personalized product recommendations to your customers, previous customer purchases on your website must be analyzed. This section explores how this data is ingested into Platform through Adobe Analytics, and how that data is transformed into a Feature dataset to be used by your machine learning Model.

Explore the data and understand the schemas

Log in to Adobe Experience Platform and select Datasets to list all existing datasets and select the dataset that you would like to explore. In this case, the Analytics dataset Golden Data Set postValues.

The dataset activity page opens, listing information relating to your dataset. You can select Preview Dataset near the top-right to examine sample records. You can also view the schema for the selected dataset. Select the schema link in the right-rail. A popover appears, selecting the link under schema name opens the schema in a new tab.

The other datasets have been pre-populated with batches for previewing purposes. You can view these datasets by repeating the above steps.

Dataset name Schema Description
Golden Data Set postValues Golden Data Set schema Analytics source data from your website
Recommendations Input Dataset Recommendations Input Schema The Analytics data is transformed into a training dataset using a feature pipeline. This data is used to train the Product Recommendations machine learning Model. itemid and userid correspond to a product purchased by that customer.
Recommendations Output Dataset Recommendations Output Schema The dataset for which scoring results are stored, it will contain the list of recommended products for each customer.

Author your Model

The second component of the Data Science Workspace lifecycle involves authoring Recipes and Models. The Product Recommendations Recipe is designed to generate product recommendations at scale by utilizing past purchase data and machine learning.

Recipes are the basis for a Model as they contain machine learning algorithms and logic designed to solve specific problems. More importantly, Recipes empower you to democratize machine learning across your organization, enabling other users to access a Model for disparate use cases without writing any code.

Explore the Product Recommendations Recipe

In Experience Platform, navigate to Models from the left navigation column, then select Recipes in the top navigation to view a list of available recipes for your organization.

Next, locate and open the provided Recommendations Recipe by selecting its name. The Recipe overview page appears.

Then, in the right-hand rail, select Recommendations Input Schema to view the schema powering the recipe. The schema fields “itemId” and “userId” correspond to a product purchased (interactionType) by that customer at a specific time (timestamp). Follow the same steps to review the fields for the Recommendations Output Schema.

You have now reviewed the input and output schemas required by the Product Recommendations Recipe. Continue to the next section to learn how to create, train, and evaluate a Product Recommendations Model.

Train and evaluate your Model

Now that your data is prepared and the Recipe is ready, you can create, train, and evaluate your machine learning model.

Create a Model

A Model is an instance of a Recipe, enabling you to train and score with data at scale.

In Experience Platform, navigate to Models from the left navigation column, then select Recipes in the top navigation. tThis displays a list of available recipes for your organization.Select the product recommendation recipe.

From the recipe page, select Create Model.

create model

The create model workflow begins by selecting a recipe. Select the Recommendations Recipe , then select Next in the top-right corner.

Next, provide a model name. Available configurations for the model are listed containing settings for the model’s default training and scoring behaviors. Review the configurations and select Finish.

You are redirected your models overview page with a newly generated training run. A training run is generated by default when a Model is created.

You can choose to wait for the training run to finish, or continue to create a new training run in the following section.

Train the Model using custom Hyperparameters

On the Model Overview page, select Train near the top right to create a new training run. Select the same input dataset you used when creating the model and select Next.

The Configuration page appears. Here you can configure the training runs num_recommendations value, also known as a hyperparameter. A trained and optimized model will utilize the best-performing hyperparameters based on the results of the training run.

Hyperparameters cannot be learned, therefore they must be assigned before training runs occur. Adjusting hyperparameters may change the accuracy of the trained model. Since optimizing a model is an iterative process, multiple training runs may be required before a satisfactory evaluation is achieved.


Set num_recommendations to 10.

Additional data points appear on the model evaluation chart. It can take up to several minutes for this to appear once a run is complete.

Evaluate the Model

Each time a training run completes, you can view the resulting evaluation metrics to determine how well the Model performed.

To review the evaluation metrics (Precision and Recall) for each completed training run, select the training run.

You can explore the information provided for each evaluation metric. The higher these metrics, the better the model performed.

You can see the dataset, schema, and configuration parameters used for each training run on the right rail. Navigate back to the Model page and identify the top performing training run by observing their evaluation metrics.

Operationalize your Model

The final step in the Data Science workflow is to operationalize your model in order to score and consume insights from your data store.

Score and generate insights

On the product recommendations model overview page, select the name of the best-performing training run, with the highest recall and precision values.

score the best run

Then, on the top-right of the training run details page, select Score.

select score

Next, select the Recommendations Input Dataset as the scoring input dataset, which is the same dataset you used when you created the Model and executed its training runs. Then, select Next.

Once you have your input dataset, select the Recommendations Output Dataset as the scoring output dataset. Scoring results are stored in this dataset as a batch.

Finally, review the scoring configurations. These parameters contain the input and output datasets you selected earlier along with the appropriate schemas. Select Finish to begin the scoring run. The run may take several minutes to complete.

View scored insights

Once the scoring run has successfully completed, you are able to preview the results and view the insights generated.

On the scoring runs page, select the completed scoring run, then select Preview Scoring Results Dataset on the right rail.

In the preview table, each row contains product recommendations for a particular customer, labeled as recommendations and userId respectively. Since the num_recommendations hyperparameter was set to 10 in the sample screenshots, each row of recommendations can contain up to 10 product identities delimited by a number sign (#).

Next steps

This tutorial introduced you to the workflow of Data Science Workspace, demonstrating how raw unprocessed data can be turned into useful information through machine learning. To learn more about using the Data Science Workspace, continue to the next guide on creating the retail sales schema and dataset.

On this page