Overview of Stitching

With many individuals interacting with your content across multiple channels and devices it becomes ever important to connect these unauthenticated events to authenticated ones. This allows for a more holistic approach to reporting and ultimately activation. This video gives a quick overview into the process of stitching.

Transcript

Hi everyone, this is Matt Thomas from the Adobe Analytics Product Management team. I wanted to give you an overview of a rather complex topic, identity stitching. This is sometimes referred to as cross-channel analytics, field-based or graph-based stitching, or sometimes even just stitching. Let’s dive into some of the details around what it is and what it is not.

This video assumes you have a base understanding of how identities are configured and work in both AEP and CJA.

Stitching has often been conflated with joining of datasets together that happens natively in CJA. Stitching is an optional process that takes a persistent or device ID and tries to elevate a given row of data to a higher-order person ID. This process can happen by deriving a mapping table from a single dataset or from several datasets using the graph. In cases where we don’t have a mapping for a given device ID, we use this as a person ID, ensuring unauthenticated traffic makes it into your analysis.

This happens before data makes it into CJA and stays compliant with any privacy request and data lifecycle management requests.

Stitching offers two algorithms by which it tries to elevate the person ID. The first is called field-based stitching, which makes an assumption that all identities needed to elevate to the desired person ID are found in the single dataset being stitched.

When shared devices are encountered in the data, then it uses a device split approach, giving credit to the right person based on time in a going-forward basis.

The second algorithm is graph-based stitching, which does not assume all identities needed to elevate to the desired person ID are found in the dataset being stitched. Instead, it uses relationships to identities from multiple sources to evaluate and elevate to the desired person ID.

When shared devices are encountered in the data, then it uses a last-auth approach, giving credit to the last person who authenticated, regardless of if an event has a different person ID on it.

Stitching determines the best person ID we can between two designated fields. These constitute a person ID and the person ID. We pass over this data a minimum of two times. In this first pass, we call it live stitching, meaning we take the device ID, and if we have a mapping for it, we use this for the person ID. But if we don’t, we use the fallback to the value for the persistent ID. As you can see from this table, we don’t know who device 1234 belongs to until the fourth event. As such, we use 1234 as the person ID for the first three events. Then once we know this device belongs to McKay, we put that value in going forward, even when the event itself does not contain this information.

You’re probably thinking to yourself, what about earlier visits? How do we connect them to McKay? I did mention that there was two passes over the data. Let’s talk about that second pass.

The second pass of the data we call replay. It gives us a chance to attribute formerly anonymous events to the right person for a given time period. The maximum amount of time that we can look back historically is 30 days. In this example, assuming McKay’s login happens within that 30 day window, we can go back and correct the data to plug McKay in as the person ID. So now you can see that we have McKay as the person ID for all of our rows of data, allowing for a more connected journey of McKay through the various channels and devices. As we know, authentication or the ability to know a person is not limited to a single channel. So why not use the power of all your channels to construct what identities are connected to a single person? This is exactly what the AAP Identity Graph does today.

For many of you who are already using AAP applications like AJO and RTCDP, this graph is familiar and is already being populated and being used. But now we can use it with CJA. CJA still requires a single person ID, but we can basically use the graph as a look up to get to the best person identifier you have. Let’s walk through an example of it.

Similar to field-based stitching, graph-based determines the most accurate person ID or falls back to the persistent ID. Instead of using identities found in the data set, it uses the power of the graph to enrich the data.

In this table, we are using the device ID column as our persistent ID and using it as a look up to the customer ID. Not directly from this data set, but we know that device 1234 is associated with McKay, and so we can plug this in as our person ID. On the third event, we don’t know who 5678 belongs to, so we use this as our person ID. But then on the fourth event, that relationship is known and added to the graph and is used as the ID ingested into CJA as that person ID.

Again, like field-based stitching, we go back over the data to adjust any IDs based on new information housed in the graph.

Now that we know 5678 is associated with McKay, we can go back and change the ID to be ingested on event 3 to McKay.

This very revolutionary process will help you jump from identities found in a data set to completely other identities that may be multiple relationships away thanks to the graph.

Let’s talk about replay a little bit more in depth.

Replay is broken down into two components, frequency and window.

Frequency is just how often we reprocess and restate the data based on any additional information we have gathered from a single data set or from the graph. We only offer daily and weekly depending on the window that is chosen. The replay window refers to the number of days of historical data that will be reevaluated.

Our current offering for replay windows are 1 day, 7 day, 14 day and 30 day based on your entitlement.

Let’s look at how these different configurations would play out given a certain scenario. These examples are applicable for both field based and graph based stitching. We will use the same visitor scenario for all of the examples but will apply different replay frequencies and windows to show the effect on the data.

This visitor comes to your site on day 1 and are anonymous. They also come back on day 2, 8 and 15.

Then on day 16 they become known and come back on day 28 as well.

With the 1 day window we use as a frequency of applying updates in our identity as you can see at the end of day 1 there are no changes to the data. At the end of day 2 we could correct day 1 but we have no additional information about this person and that also holds true at the end of day 8 and the end of day 15. But on day 16 since we know who this person is we can replay 1 day and attribute day 15 to the same person as day 16.

Day 28 has no changes.

With the 7 day window assuming the same visitor scenario we replay the last 7 days at a frequency of weekly. So on day 7 we review day 1 and 2 visits and we’re not making any changes because we don’t have any updated information. Day 14 the replay runs again and again no new updates to the data. On day 21 we actually have day 16 information where we know who he is and now we can update day 15 with that information. Day 28 there’s no changes.

Things get interesting with the 14 day look back. While we are running at the same frequency of every 7 days we have a much longer window of time that we are replaying.

The first replay on day 7 has no effect on the data neither does the second day of day 14.

The changes with day 21 replay with a much longer window we can correct not only day 15 but we can also correct day 8 as well. And get the right person information on there. Day 28 there’s no changes.

With the 30 day window which our longest window we have a much better chance of making those anonymous events known. Again using that same frequency of weekly on day 7 we don’t have any new information and the same holds true for day 14. But by day 21 because we are looking back 30 days of data we can fix beyond day 8 of data we can get 15, day 8, day 2 and day 1 and attribute it all to the right person and with day 28 there’s no changes. Hopefully this has provided you with a better understanding of stitching and how it can improve the identities in your data and help to provide with much more accurate customer journey analysis. Thank you.

recommendation-more-help
a05d7212-fdba-4b70-a337-d5897f329c68