Data Science Workspace architecture overview

Last update: 2023-09-25
  • Created for:
  • Beginner

This video walks through an overview diagram and explains the primary components of Experience Platform as it relates to Data Science Workspace. For more information, please visit the Data Science Workspace documentation.


In this video, we’re going to walk through an overview diagram that illustrates the primary components of Data Science Workspace in Adobe Experience Platform.

Like other experience platform architectures, we start off with data collection. For client side collection from the web and mobile sites, Experience Platform Launch can be used alongside other connectors, software development kits, cloud and third-party applications, enterprise sources and data integration ETL ecosystems. You then have a choice, you can choose to stream your data through one of platform streaming data collection endpoints or you can batch this data in using platforms batch API by landing batch files in a file storage location such as FTP or as your blob. Once this data is ingested into platform, a few things happen.

Any data that is streaming is placed on the Experience Platform Pipeline then sent to the data lake. If this data is set to be profile enabled it is immediately available in Real-Time Customer Profile. Any data we onboard from other sources is comprised of batches and stored in the data lake as data sets and files. To ensure that the data shares a common model, batch data is stored using the experience data model or XDM system. This data model can be extended to account for any customer attributes that are unique to your data model. This helps provide a consistent definition of the data making it easier to stitch the data into Real-Time Customer Profile and enables data scientists and developers to quickly understand and leverage the data.

Once your data is in the data lake, data scientists can start to work with it to develop an operationalize machine learning models. Data Science Workspace allows data scientists to easily bring in existing models into activation workflows or build entirely new models on top of their data and platform.

There are two main aspects to Data Science Workspace, exploratory data analysis and model development supported by an integrated JupyterLab experience and model operationalization using a unified framework to manage the life cycle of your models and platform. The JupyterLab environment has been extended to support native access to your data in the data lake. This is supported through query service which allows data scientists to use SQL to explore and shape existing data sets as well as create new derived datasets directly from within the JupyterLab notebook environment. Data Science Workspace comes with pre-built notebook templates that you can use to get started easily. For instance, we have propensity, clustering and decision tree templates so that it takes you one day instead of several to get something working. Additionally, there are generic templates which can help you build custom models using your favorite language like R, Python, Scala or Spark to help quickly operationalize models. Data Science Workspace comes with the notion of recipes. Recipes are blueprints for machine learning pipelines that can help you bring together machine learning code, any pre or post processing steps that you want to apply to the data and the configuration to go along with it. Once a data scientist builds the recipe, it is sent to the Data Science Workspace, machine learning compute service powered by Adobe Sensei to assist with training, scoring, deploying, learning and monitoring. Data Science Workspace provides data scientists with additional workflow and management services such as the model training and evaluation framework. Finally, once a model has been published, you can schedule additional scoring and training jobs to continuously learn based on new batch and streaming data over time. This ensures that your model stays up to date and your predictions remain accurate. Following our model inputs, we can see that the insights and attributes are stored in the data lake and sent to a Real-Time Customer Profile allowing for immediate actions. This includes building a stronger identity graph and segmentation based on machine learning to attributes. This data is then sent to the Real-Time Customer Data Platform where you can enforce data usage policies and configure destinations to action any of your data driven insights. For example, audiences can be shared to email systems, social ad networks or other destinations extending the capability of Adobe Audience Manager by allowing direct access to an activation of not just anonymous data and destinations but also direct known customer data as well. Of course, you can activate your data to not just Adobe but also non Adobe destinations and there you have it. You should now have a good understanding of “The Basic Data Science Workspace Components and Architecture.” Thanks for watching. -

On this page