Basic reading of data
With the new Experience Platform SDK, the maximum read size is 32 GB, with a maximum read time of 10 minutes.
If your read time is taking too long, you can try using one of the following filtering options:
The organization is set within the
client_context
.Python
To read data in Python, please use the code sample below:
from platform_sdk.dataset_reader import DatasetReader
dataset_reader = DatasetReader(client_context, "{DATASET_ID}")
df = dataset_reader.limit(100).read()
df.head()
R
To read data in R, please use the code sample below:
DatasetReader <- psdk$dataset_reader$DatasetReader
dataset_reader <- DatasetReader(client_context, "{DATASET_ID}")
df <- dataset_reader$read()
df
Filter by offset and limit
Since filtering by batch ID is no longer supported, in order to scope reading of data, you need to use offset
and limit
.
Python
df = dataset_reader.limit(100).offset(1).read()
df.head
R
df <- dataset_reader$limit(100L)$offset(1L)$read()
df
Filter by date
Granularity of date filtering is now defined by the timestamp, rather than being set by the day.
Python
df = dataset_reader.where(\
dataset_reader['timestamp'].gt('2019-04-10 15:00:00').\
And(dataset_reader['timestamp'].lt('2019-04-10 17:00:00'))\
).read()
df.head()