Optimize a model using the Model Insights framework
The Model Insights Framework provides the data scientist with tools in Data Science Workspace to make quick and informed choices for optimal machine learning models based on experiments. The framework will improve the speed and effectiveness of the machine learning workflow as well as improving ease of use for data scientists. This is done by providing a default template for each machine learning algorithm type to assist with model tuning. The end result allows data scientists and citizen data scientists to make better model optimization decisions for their end customers.
What are metrics?
After implementing and training a model, the next step a data scientist would do is to find how well the model will perform. Various metrics are used to find how effective a model will do compared with others. Some examples of metrics used include:
- Classification accuracy
- Area under curve
- Confusion matrix
- Classification report
Configuring recipe code
Currently, the Model Insights Framework supports the following runtimes:
Sample code for recipes can be found in the experience-platform-dsw-reference repository under recipes
. Specific files from this repository will be referenced throughout this tutorial.
Scala scala
There are two ways to bring in metrics to the recipes. One is to use the default evaluation metrics provided by the SDK and the other is to write custom evaluation metrics.
Default evaluation metrics for Scala
Default evaluations are calculated as part of the classification algorithms. Here are some default values for evaluators that are currently implemented:
evaluation.class
com.adobe.platform.ml.impl.DefaultBinaryClassificationEvaluator
com.adobe.platform.ml.impl.DefaultMultiClassificationEvaluator
com.adobe.platform.ml.impl.RecommendationsEvaluator
The evaluator can be defined in the recipe in the application.properties file in the recipe
folder. Sample code enabling the DefaultBinaryClassificationEvaluator
is shown below:
evaluation.class=com.adobe.platform.ml.impl.DefaultBinaryClassificationEvaluator
evaluation.labelColumn=label
evaluation.predictionColumn=prediction
training.evaluate=true
After an evaluator class is enabled, a number of metrics will be calculated during training by default. Default metrics can be declared explicitly by adding the following line to your application.properties
.
evaluation.metrics.com=com.adobe.platform.ml.impl.Constants.DEFAULT
A specific metric can be enabled by changing the value for evaluation.metrics.com
. In the following example, the F-Score metric is enabled.
evaluation.metrics=com.adobe.platform.ml.impl.Constants.FSCORE
The following table state the default metrics for each class. A user can also use the values in the evaluation.metric
column to enable a specific metric.
evaluator.class
evaluation.metric
DefaultBinaryClassificationEvaluator
-Recall
-Confusion Matrix
-F-Score
-Accuracy
-Receiver Operating Characteristics
-Area Under the Receiver Operating Characteristics
PRECISION
-
RECALL
-
CONFUSION_MATRIX
-
FSCORE
-
ACCURACY
-
ROC
-
AUROC
DefaultMultiClassificationEvaluator
-Recall
-Confusion Matrix
-F-Score
-Accuracy
-Receiver Operating Characteristics
-Area Under the Receiver Operating Characteristics
PRECISION
-
RECALL
-
CONFUSION_MATRIX
-
FSCORE
-
ACCURACY
-
ROC
-
AUROC
RecommendationsEvaluator
-Normalized Discounted Cumulative Gain
-Mean Reciprocal Rank
-Metric K
MEAN_AVERAGE_PRECISION
-
NDCG
-
MRR
-
METRIC_K
Custom evaluation metrics for Scala
The custom evaluator can be provided by extending the interface of MLEvaluator.scala
in your Evaluator.scala
file. In the example Evaluator.scala file, we define custom split()
and evaluate()
functions. Our split()
function splits our data randomly with a ratio of 8:2 and our evaluate()
function defines and returns 3 metrics: MAPE, MAE, and RMSE.
MLMetric
class, do not use "measures"
for valueType
when creating a new MLMetric
else the metric will not populate in the custom evaluation metrics table.metrics.add(new MLMetric("MAPE", mape, "double"))
Not this:
metrics.add(new MLMetric("MAPE", mape, "measures"))
Once defined in the recipe, the next step is to enable it in the recipes. This is done in the application.properties file in the project’s resources
folder. Here the evaluation.class
is set to the Evaluator
class defined in Evaluator.scala
evaluation.class=com.adobe.platform.ml.Evaluator
In the Data Science Workspace, the user would be able to see the insights in the “Evaluation Metrics” tab in the experiment page.
Python/Tensorflow pythontensorflow
As of now, there are no default evaluation metrics for Python or Tensorflow. Thus, to get the evaluation metrics for Python or Tensorflow, you will need to create a custom evaluation metric. This can be done by implementing the Evaluator
class.
Custom evaluation metrics for Python
For custom evaluation metrics, there are two main methods that need to be implemented for the evaluator: split()
and evaluate()
.
For Python, these methods would be defined in evaluator.py for the Evaluator
class. Follow the evaluator.py link for an example of the Evaluator
.
Creating evaluation metrics in Python requires the user to implement the evaluate()
and split()
methods.
The evaluate()
method returns the metric object which contains an array of metric objects with properties of name
, value
, and valueType
.
The purpose of the split()
method is to input data and to output a training and a testing dataset. In our example, the split()
method inputs data using the DataSetReader
SDK and then cleans up the data by removing unrelated columns. From there, additional features are created from existing raw features in the data.
The split()
method should return a training and testing dataframe which is then used by the pipeline()
methods to train and test the ML model.
Custom evaluation metrics for Tensorflow
For Tensorflow, similar to Python, the methods evaluate()
and split()
in the Evaluator
class will need to be implemented. For evaluate()
, the metrics should be returned while split()
returns the train and test data sets.
from ml.runtime.python.Interfaces.AbstractEvaluator import AbstractEvaluator
class Evaluator(AbstractEvaluator):
def __init__(self):
print ("initiate")
def evaluate(self, data=[], model={}, config={}):
return metrics
def split(self, config={}):
return 'train', 'test'
R r
As of now, there are no default evaluation metrics for R. Thus, to get the evaluation metrics for R, you will need to define the applicationEvaluator
class as part of the recipe.
Custom evaluation metrics for R
The main purpose of the applicationEvaluator
is to return a JSON object containing key-value pairs of metrics.
This applicationEvaluator.R can be used as an example. In this example, the applicationEvaluator
is split into three familiar sections:
- Load data
- Data preparation/feature engineering
- Retrieve saved model and evaluate
Data is first loaded to a dataset from a source as defined in retail.config.json. From there, the data is cleaned and engineered to fit the machine learning model. Lastly, the model is used to make a prediction using our dataset and from the predicted values and actual values, metrics are calculated. In this case, MAPE, MAE, and RMSE are defined and returned in the metrics
object.
Using pre-built metrics and visualization charts
The Sensei Model Insights Framework will support one default template for each type of machine learning algorithm. The table below shows common high-level machine learning algorithm classes and corresponding evaluation metrics and visualizations.
- MAPE
- MASE
- MAE
- Precision-recall
- Accuracy
- F-score (specifically F1 ,F2)
- AUC
- ROC
- For each class:
- precision-recall accuracy
- F-score (specifically F1, F2)
- RI (Rand index), ARI (adjusted Rand index)
- homogeneity score, completeness score, and V-measure
- FMI (Fowlkes-Mallows index)
- Purity
- Jaccard index
- Silhouette coefficient
- CHI (Calinski-Harabaz index)
- DBI (Davies–Bouldin index)
- Dunn index
-Normalized Discounted Cumulative Gain
-Mean Reciprocal Rank
-Metric K