Basic reading of data

With the new Experience Platform SDK, the maximum read size is 32 GB, with a maximum read time of 10 minutes.

If your read time is taking too long, you can try using one of the following filtering options:

NOTE
The organization is set within the client_context.

Python

To read data in Python, please use the code sample below:

from platform_sdk.dataset_reader import DatasetReader
dataset_reader = DatasetReader(client_context, "{DATASET_ID}")
df = dataset_reader.limit(100).read()
df.head()

R

To read data in R, please use the code sample below:

DatasetReader <- psdk$dataset_reader$DatasetReader
dataset_reader <- DatasetReader(client_context, "{DATASET_ID}")
df <- dataset_reader$read()
df

Filter by offset and limit

Since filtering by batch ID is no longer supported, in order to scope reading of data, you need to use offset and limit.

Python

df = dataset_reader.limit(100).offset(1).read()
df.head

R

df <- dataset_reader$limit(100L)$offset(1L)$read()
df

Filter by date

Granularity of date filtering is now defined by the timestamp, rather than being set by the day.

Python

df = dataset_reader.where(\
    dataset_reader['timestamp'].gt('2019-04-10 15:00:00').\
    And(dataset_reader['timestamp'].lt('2019-04-10 17:00:00'))\
).read()
df.head()