Overview of Stitching
Last update: Thu Jan 18 2024 00:00:00 GMT+0000 (Coordinated Universal Time)
- Topics:
- Stitching
CREATED FOR:
- Intermediate
- Experienced
- Admin
- Developer
With many individuals interacting with your content across multiple channels and devices it becomes ever important to connect these unauthenticated events to authenticated ones. This allows for a more holistic approach to reporting and ultimately activation. This video gives a quick overview into the process of stitching.
Transcript
Hi everyone, this is Matt Thomas from the Adobe Analytics Product Management team. I wanted to give you an overview into a rather complex topic, event stitching. This is sometimes referred to as cross-channel analytics, or build-based stitching, or sometimes even just stitching. Let’s dive into some of the details around what it is and what it is not before jumping into Analysis Workspace. It is helpful to understand that person-based reporting is a desire of most organizations. Not only does it help to more accurately represent a person, but helps to enrich activations that may happen downstream. Let’s walk through an example. As you can see here, Quarry is represented by several different kinds of identities, gone across several different channels. This can be from device-centric identities like ECID, or they can be from more person-based identities like email, CRM, or loyalty number. Traditionally Quarry has been perceived as many different people due to having multiple devices and device-centric reporting. Since most of us, including Quarry, have multiple devices and touchpoints with any given brand, he needs to be represented in a reporting more holistically. CJA brought this kind of holistic, person-based reporting by allowing you to co-mingle data from various channels and sources. One key tenet of CJA is that each dataset contains a single field that is to be used as the person identifier. Any other dataset that is to be used together needs to also have the same person identifier in common. In some datasets, it’s not possible to ensure that this is the case. Some events may be missing on these different events, and that’s why Stitcher was introduced. This feature helps to ensure a person ID is present on every row by leveraging a known ID, person ID, or falling back to the device ID if it’s not there. This feature also allows person identities to transcend multiple sessions and helps to connect unauthenticated events with authenticated ones. This stitching process passes over the data a minimum of two times. There is a third optional pass that happens as a result of a privacy request. Let’s dive into each of these passes of the data and the results by starting with live stitching. As the name implies, this process happens as the data comes into AEP. As you can see here, we have a source dataset, a stitch dataset, and an identity table. Let’s walk through a couple different examples and see what the end result would be. The first event comes into the system on the source dataset. We replicate that into a stitch dataset and augment that dataset by adding a Stitched ID column. We don’t know who this person is and this event, so we plug in the device ID that we have. Second event is very similar to the first. We don’t know who the person is. By the third event, some sort of authentication or login has happened where we know that this device ID is associated with this person ID. We take that information and we add it into our identity table. Now we know for future events, we can use that same person identifier and plug it in. We now replicate this into our Stitched dataset, and you can see that our Stitched ID column has Corey at email.com. If we continue, these different events come in. You’ll see that now with this fourth event, we didn’t have Corey’s information on that row itself, but we were able to reference our identity table and plug it into the Stitched ID column. On event five, we see a new row coming in with a new device ID and the same person ID. So now we can see that Corey has two different devices associated with him, and we add that to our identity table. We take our Stitched dataset and again, plug in Stitched ID to that column. And on the sixth event, we don’t know who this is, so it’s kind of just left the device ID. At this point, the data represents four different people, three different device IDs, and an email of Corey at email.com. We know that’s not accurate, but as the data flowed in, that’s what we were able to do. This is where the replay functionality is very powerful. When we come over to replay, what we do is we look back on the data either one day or over the course of seven days and look if we can true up some of the data to make sure that Corey is attributed to the right events. Since we already have that source dataset there and we have our identity table built out, we rerun through that Stitched dataset and now we can say on event one, we have that identity in there in our identity table, and we can plug Corey into that Stitched ID column. With event two, we could also do that now. So you can see that by the end of this, what was previously represented by four different people in reporting is now two people, and that is accurate. We don’t have the last event mapping for the person to that device ID, so that is accurate. Those are the two passes over the data that we do. There is a third optional one involving privacy, like I mentioned. So in the privacy flow, it is basically we have to undo everything we did. So sometimes we refer to this as unstitching. So we have our source dataset, our identity table, and our Stitched dataset. Depending on your configuration, if you’re using the source connector or using the Web SDK, it might result a little bit different here, but this is going down the Web SDK scenario where you have asked the system to forget Corey at email.com. We go ahead and remove that row from the source dataset. We remove that row from our identity table, and we also remove it from our Stitched dataset. We also remove event five there, and also the identity tables is updated. Last thing we do is now we still have Corey in that Stitched ID column. We need to go undo spreading that identity into those other events. So we go and replace Corey with the appropriate device ID, and so that is the end of the flow. Now that you have an understanding of what stitching does, we can jump into Analysis Workspace and see the impact it has on the reporting. Assuming you have already requested a Stitched dataset, it has been created and backfilled, and you’ve established a connection and data view, let’s look at some of the results. In the left panel, we have a connection to an unstitched dataset. On the right panel, you have a connection with a Stitched dataset. Starting with the left panel, you can see in the first table that the connection was set up using ECID as the person ID, and it is found on every row. In the table below, you can also see that roughly 2.4% of the events have an email set. In the final table in that panel, you can see the authentication rates for each month over the past 60 days, and they’re all relatively low. Now looking over at the right panel, you can now see that the events with the email has jumped from that 2.4% all the way up to 38.4%. This means that we are now connecting more events to people instead of devices to helping make sure that we’re taking a more holistic approach to analysis and activation. I hope that helped you understand event stitching a little bit more, and hope you enjoyed it.
recommendation-more-help
a05d7212-fdba-4b70-a337-d5897f329c68