Cloud 5 - Search optimization in Edge Delivery Services

Get your lighthouse score to 100 buy using query and search optimization when dealing with large numbers of records. Work with techniques to optimize the query index, enhance search functionality and divide query indexes for the best performance optimization

Transcript
And welcome to the cloud five interview series. My name is James Talbot and I’m here with Varun Mitra, who is a cloud architect on the AEM engineering team. And we’re going to talk today about query optimization. Varun worked on a project for the CRN website. And he worked specifically with some performance optimization on that site. And the query index on that site dealt with over a quarter of a million records that certainly makes searching retrieval quite challenging. So here to talk about this is Varun Mitra, and he’s going to talk about some of the optimization techniques that he used during this project. Very exciting. Varun. Thank you very much, James. I’m going to go ahead and share my screen. Hello, everyone. My name is Varun Mitra. So let’s talk about performance optimization. So as James mentioned, we did some performance optimization for CRN.com. Till date, CRN.com is one of the largest EDS projects. CRN.com had over 250,000 web pages. And as you can see on the web page on the right hand side, it uses a lot of Google ads. So that was the first point of optimization that we had to do. Since Google ads are loaded from an external source, we had to make sure that the ads are placed correctly on the screen. This allowed us to avoid the cumulative layout shift that might occur when you are inserting a dynamic element. As you can see, when the page is loaded, a skeletal structure is visible and then the ads become available to you. Further, we have delegated the ad loading logic to delete.js. This again allowed us to reduce the total blocking time. Now, another challenge that we faced for CRN.com was that CRN.com had over 250,000 web pages. These many number of web pages means that you will have that many records in your query index. When you have such a large query index, this can affect your search results, retrieval, and also your query traversal. This will also have an impact on how the content is displayed on your web pages. Like for example, on our home page, all the data is actually fetched from query index. So it’s not just the search functionality, but the overall functioning of the site that is impacted whenever you have a very large query index. Now let’s talk about the steps that we took to mitigate and improve the performance. So for CRN.com, we did a lot of query and search optimization. We basically followed a three-tier approach. The first thing that we did was we split the query index across two files. We have a default query index that is auto-generated and auto-populated. We restricted this query index to 20,000 records. Basically, we went ahead and previewed and published all the content. This allowed us to have all the records populate within the query index. Then we applied a filter on the default query index and identified records that were created, say, before 2020. Once we had all of these records, we moved them over to a storage query index, which was kind of an archive solution. This way, we were able to limit the records present within the default query index and have all the archive data present within a storage query index. Now, this was a one-time operation, and once we have set this up, we were good to go. So the query index was split across two files. We also had to update the default search functionality. So what we did was we implemented a two-tier search. Basically, we looked for records first in the default query index, and then we searched for them within the storage query index. Now, with this done, the last thing that we did was we split the query indexes into different sheets based on article category. This is something which I’ve talked about in an earlier video. We used Excel formulas to basically split the records, to identify records that belong to a certain category, and split them within different sheets. So this is how we were able to achieve query optimization with CRN.com. Let me move on to the next slide and show you what the default and your storage query index looks like. If you were to look closely, you can see that the default query index is about 19,000 records, and the storage query index is about 70,000 records. So once this is set up, you can see that we were able to reduce the default query index to a certain limit. This allowed us to have faster query retrieval. This allowed us to have much faster search results also. Fantastic, Bro. And this is a great indication of how to really optimize those queries for the best performance. So we appreciate it, and thank you very much. Thank you, Raymond. James.
recommendation-more-help
4859a77c-7971-4ac9-8f5c-4260823c6f69