Load data in JupyterLab notebooks

This video shows how to create a JupyterLab notebook and load data from Adobe Experience Platform. It also shows how you can increase the performance of your notebook when working with large amounts of data. For more information, please visit the Data access in Jupyterlab notebooks documentation.

In this video, we will explore the JupyterLab Notebooks user interface in Adobe Experience Platforms, data science workspace. After watching this video, you should have a basic understanding of JupyterLab notebooks and how to read, write and query data in notebooks.
First, from the Experience Platform homepage, let’s navigate to the data science tab, and select the Notebooks. From here, we want to select JupyterLab and allow some time for the environment to initialize. The first thing you’ll notice is Jupiter notebooks runs natively on Experienced Platform.
Once the environment is loaded, we are able to create notebooks using Python, R, Pi spark and Scala. Additionally, we can upload files and edit them in JupyterLab by selecting the upload icon and then selecting the notebook file we wish to import.
Let’s open a new blank Python notebook. Once the notebook has opened, we’re able to run a number of commands related to Jupiter notebooks. For instance, with this command, we can see the pre-installed packages. We also have the ability to load external packages. One of the benefits of running JupyterLab natively on Adobe Experience Platform is the ability to load data seamlessly from Experience Platform using the data access SDK. Let’s take a look at some of our Luma profile data. Start by selecting the data set icon in the left menu and selecting the data set folder. All our available data sets load and from here, right click the data set we are interested in exploring. This brings up a few options. We can explore data in notebook, write data in notebook, and because we’re in a Python notebook, we also have the query data notebook option available. Let’s select explore data in notebook. JupyterLab automatically generates the lines of code needed to load this data set. By default, this code cell is limited to a hundred lines. To execute this cell select the Play button. An asterisk is displayed next to the cell and the current circle turns dark indicating that the process is running. Depending on the data set and code cell, this can take some time. The bottom right corner contains performance metrics. If you notice that your notebook performance is not where you’d like it, or you wish to increase performance, you can select the gear icon in the top right of the JupyterLab UI. This opens the notebook server configuration dialog. Here, you can turn on GPU and set the memory for increased performance. Select Update configs to apply the changes and you’ll be asked to restart JupyterLab. Your organization may have more memory available depending on how many Adobe Experience Platform intelligence package add-ons were purchased.
In some cases, you may want to work with very large amounts of data above 5 million rows, or might receive an error such as remote RPC client disassociated. This is because JupyterLab is having trouble reading your data set or you’ve run out of memory.
In the event that you are dealing with large amounts of data for reading and writing data sets, you will want to use a Pi spark or Scala notebook in batch mode. For Scala, we just need to change the interactive value under mode to batch. If we want to use a Pi spark notebook, we can add dash, dash mode batch to ourselves. Without the dash, dash mode, the default will be interactive.
To learn more about the different limitations for each Notebook, visit the trooper to lab data access guide in the data science workspace documentation. Now you should understand the basics of reading and writing data sets and JupyterLab. In the next video, we’ll dive deeper into JupyterLab by building training and scoring our retail sales recipe using the recipe builder template. Thanks for watching and seeing the next video. -