Essential Tips and Best Practices for AEM Lucene Search

Discover how to enhance your digital presence and improve customer engagement with cutting-edge search features, including filters, facets, auto-suggest, NGram, and spell check. Learn from real-world demos and gain insights into optimising your search capabilities with AEM and Lucene. This webinar provides you the opportunity to elevate your search experience and stay ahead in the digital landscape.

Transcript

Hello everyone.

Let’s wait for a couple of minutes to allow other attendees to join in.

All right. Maybe we can just give it a go and let the attendees go in slowly.

I hope everybody can see my screen. Hello everyone. Good morning and good afternoon. Welcome to today’s session on essential tips and best practices for AM Lucene Search. My name is Gaurav Kumar and I work in Adobe’s Field Engineering, where we focus on helping the customer get as much value as possible from Adobe solution. I’ll be today your host come facilitator. Before starting quick housekeeping items, this live webinar is a listen-only format, but do feel free to share any questions into the Q&A pod. This session is being recorded and slides and resources will be shared afterwards.

Before we dive into the actual content, which while final attendees continue to filter in, I wanted to let you know that we have several other ultimate success virtual sessions coming up this quarter that you can see right in front of your screen and that are open for you to attend as well. For those who are interested, I will put the links in the session chat and you can have a look.

Additionally, we have a comprehensive on-demand library on Experience Leap with recordings of past sessions covering adoption and enablement, organizational readiness, technical readiness topics. These are fantastic resources to continue your learning journey beyond today’s session.

Finally, before we dive into today’s technical content, I want also to take a brief moment to introduce Adobe Ultimate Success, the program that Vineet and I are part of as field engineers. What you see on this slide is the foundation of our ultimate success model, which combines proactive strategic leadership with responsive technical support to help customers maximize value and maintain stability with their Adobe solutions. On the proactive side, we work hand in hand with your team to align on a unified success plan, execute targeted accelerators, and support your roadmap through expert-led activities across technical health, strategic planning, and event readiness. At the same time, we are here to ensure rapid resources when needed. Our responsive model includes dedicated, supported resources and subject matter experts to monitor, prioritize, and resolve issues swiftly, whether it’s a P1 incident or ongoing incident analysis. Together, this approach ensures you are not only covered for today’s need, but are continuously moving towards long-term success and value realization.

Now, let me introduce our distinguished presenter for today. With us, we have Vineet Kaushik, who is working as a technical solution manager at Adobe. Vineet has been instrumental in helping enterprises optimize their AM implementation and has deep expertise in search architecture and performance tuning. He will be walking you through practical insights and best practices that you can apply to enhance your search capabilities. As for my role today, I’ll be your facilitator and technical support. While Vineet presents, I will be actively monitoring the chat to answer your technical questions in real time. So feel free to drop your queries in the chat at any point, and I’ll respond as we go. For questions that need more detailed discussion, all right, without further ado, let me hand it over to Vineet to kick off the presentation. Vineet, the floor is all yours.

Thank you, Gaurav.

Let me share my screen first.

All right.

Thanks, Gaurav.

Also, thanks, everyone, for joining in today’s session, which is going to be on AM losing search.

I’m Vineet Kaushik. I’m technical solution manager slash field engineer at Adobe. I’ve been with Adobe for almost on and off nine years, playing different roles and different teams.

But for today’s agenda is to just see what we are going to cover for the set AM losing search. We would start with the basics in terms of where exactly AM search really fits in and how and where behind the scene or slash under the hood, how you can get to see that how you can define the Ork index definition essentially responsible for the search, right, optimized search, rather.

And then we’ll get to see some of the best features of the level out of the box. There are many, but we will be talking about a few, right, considering the timeline. And then there are last three slides, which essentially is going to talk about the first thing being the workshop material, which means I have one slide, which where we have some material available for you to try your hands on.

And then there’s one slide, which is on the JCR credit sheet, followed by some resources and further readings.

All right, let’s start with the basics and overview of AM search.

OK, so looking at this slide, these screenshots, I believe, must be somewhat familiar to most of the folks joining today, considering you do have some AM background, technical or functional.

But yes, you see that pretty much in AM, the search is everywhere, right? If you talk about AM author, if you go to any function, you will end up using or leveraging the search within AM. And same is the case with if you have any website or a portal powered by AM, it doesn’t matter if it’s a cloud or on-prem or assets. And if it is powered by AM Lucene search, then yes, behind the scenes, it’s always most of the time, because it comes out of the box, which is Lucene search.

There’s one point which I must highlight is this presentation is focused only on the Lucene search. I mean, it’s not about any other external search or any other type of search, for that matter.

So we’ll start with the basics. So if you see my screen, this is a browser and this is AM author environment. And if you notice that if I try to search something, there are a couple of, more than a couple of things happens. The first thing is you get to see something is being auto-suggested. OK, that’s the first thing, as in when you try to search within AM. Second thing, you get to see that, you get to see the total count of the results this search provided.

And then I see that I see some Lucene loading happening, which means some sort of a pagination as well. Third, fourth, I see that there are some filters here, which means I can further narrow down my search here. And also there are some facets as well, where you get to see the counts against the type of a search, right? So when we do a search, there are many features behind the scene, empowering the search in AM. And now to be precise, this is not just about AM author, I’m talking about, but in general, even for the AM publisher or any website or portal to power by AM.

All right. So with that said, let’s see when I was searching on AM author for the keyword mountain. So let’s see on a very high level that what happens under the hood, right? What is that one thing which is making that possible essentially? So at a very high level, the process essentially starts from the application code. You may have some component or a servlet, or maybe some backend logic which is firing some query, maybe written in XPath or SQL 2 or Query Builder. And then it is translated by the ORC query engine into the JCL SQL 2 always, right? Which means that this is the language which ORC query engine uses for its internal processing.

And once it is done, ORC query engine gets into a process to figure out that what all available ORC indexes are more suitable for the query which we are targeting, right? So it would be, and I said that it, I mean, the ORC query engine would try to find the most optimized ORC index, depending upon the node type we are searching for. If you are searching for, let’s say the assets, which means damn colon asset is a one node type we are searching against. If it was, of course pages, then it has to be CQ colon pages or something, right? And in that process to come up with or to identify the correct or the most optimized ORC index, it has to plan, it has to create certain query plan for it, which means not just the node restriction, it would check the cost against available ORC indexes. One example, there may be a multiple ORC index available for a one node type called damn asset, right? But based upon the query, ORC query engine is going to decide which ORC index is more suitable for the given query to execute and give you the result, right? And that’s how you get the result. And that’s how it gets available to the user, to the presentation layer, right? Application layer to display. So overall, this is a very high level thing which happens behind the scene.

Cool.

With that said, there’s another little deeper if you go into, just to highlight this section to differentiate between the content and the index.

So the content, if you are a user, you know that there are various type of content available, like there could be pages, there may be assets, I don’t know, there could be tags, their respective properties, metadata, all that is pretty much is nothing but a content only.

But when you wanna do the searching for a one given type, you may not wanna search against the entire repository, right? If you do that, it would be a very, very expensive process. Hence, the idea is to separate the index from that content.

So that as and when a query requires certain tasks to be performed, instead of picking up the content directly from the content repository per se, it should fetch the content from the index section of the repository, right? If I can show you one example of it, I think that would give a little understanding, high level understanding at least.

So oops, all right.

So if you see here, this is CRX Quickstart, where we have repository section index, we have all these bunch of ORC indexes, all right? So this is the, I’m sorry, this is the location, the physical location where all these ORC indexes get saved.

However, when you see here in the CRX-D under ORC index, many times, we make some innocent mistake understanding the fact that this is the ORC index, these are the ORC indexes, which we see here, but that’s not true. These are not ORC indexes, rather these are the ORC index definitions, which means these definitions are holding certain set of instructions in the name of these flags, in the name of this structure. So this is definition which is gonna drive the formation of the indexes for you, right? So there has to be clear differentiation to understand because many times, at least when I started learning the ORC indexes part, this is one of the most misunderstood concept for me, where I thought these are the ORC indexes, but that’s not the case.

All right, moving on, we are going one level deep now. Now we are gonna focus specifically on the building blocks of Lucene ORC index definition.

Okay, so first thing first, the type, right? So the type decides that what kind of, or what type of index it really is. Is it a Lucene index, or it’s a property index, or it’s some other type of index? So that’s decided by this index type in a flag, which you will see that, followed by the node type, which means what type of node you are, creating that particular ORC index for, maybe for CQ page or DM, et cetera, right? Followed by the path restriction.

I may be switching in between, so to give you a little better understanding about ORC indexes. So there you will get to see some of the things like the index, I mean, these are the index rules. Then this is the type of index support, right? Node type support.

Followed by the path restriction I was gonna cover, which is this one, this is where it is mentioned that this ORC index is gonna cover only this section of the repository.

Then it talks about the aggregate rules, which means you may have your content saved, right? In the JCR content of your file type, maybe in the metadata, maybe even a little more deep.

So there you go. There are some expressions you see, right? They have certain meanings, which means they guides, I mean, these configuration guides that what level of information is supposed to be indexed versus whatnot. I’ll be covering those in a little more detail as well in coming slides.

The last but not the least, and the most important aspect of any given Lucy-Noc index is the index rules.

So if you see that in this section, these are quite a few index rules. You see that this index rule is made for the dam asset, cool, and then it has these many properties. And then we decide based upon certain properties, I sometimes call them flag. So Nodescope index, for an example, or a property index using the spellcheck. What are these? Why do we have all these many number of properties? And what do these flags signify? So this is one of the most core part of the configuration in any given OCH index definition. So we would be covering that in a little more detail as well.

Now the next section, after knowing the search flow, how it works, and then knowing a little around the oral building blocks of a given OCH index definition. Now let’s look into how to define them and then to see what are the best practices you can apply to optimize such Lucy-Noc index in IEM.

Okay, so as a developer or as a technical person, as in when I try to think of an OCH index, I mean, it’s very intuitive that automatically the first thing comes into the mind is the query, right? Which you wanna use for the search instead of the OCH index. This is a very normal mindset. Hence, I’m starting with a query first, and then we’ll kind of going into the respective OCH index definitions and then how to optimize those.

Okay, so the first thing first, it’s saying, I mean, this query comprises of some important nuances to understand because these nuances are helpful to know, especially when you are designing or optimizing your OCH index. The first one being a type. I select star from demi-set, which means this is the node type. So we call it node type restriction on a query, where we see that there’s a one big path, which means we are trying to search for some content within this path. We call that a path restriction, right? Because we would be using these terminologies for the OCH index definition as well. So it’s important to know some of these basic things. Followed by, you see that it’s about a search. So it’s against a property, right? So we call it a property restriction, right? Followed by contain, which means it is trying to do some full text search since it is contains like like, you have ampersand in between, somewhat related to that.

So here we are trying to search for a keyword called December within this property sitting here. So we call that as a full text property search.

Okay, so with these terminologies in mind, move to a next section, which is about defining the index, right? The first thing which I think I must mention is, as on when you are going to create or define any custom OCH index, don’t do it from scratch. That’s the first best practice, don’t do it from scratch because there are many out of the box available, make a copy of it and then build something on top of it. That’s a one first, very first best practice one must follow when trying to come up with a new OCH index definitions.

We know that all these OCH indexes of course are supposed to sit in this location, OCHs, colon index.

And within this, we decide based upon this configuration that how the index is going to get indexed.

So there are a couple of values, a couple of fields to take care of. First one being compact version, which means compatibility version. This is two, this is the most recent one to the best of my knowledge. Use that only, these are the required fields without these OCH index definition cannot be complete. Second being type, where you say that what type of index you want to create a new OCH index. Is it a Lucene index or it’s OCH property index or is it some other type of index? Since we are focusing on the Lucene, so we’ll keep it as a Lucene.

And then you decide the sync mechanism, right? Which means you decide as on when any change is made to the content, how frequently it is gonna sync with the OCH index, which is holding the most latest data for the user to do the search against, right? So here we covered these sections. We’ll be covering the index rules in the coming slides.

But one quick tip is specifically for the developer, which is very useful is when you have these OCH index definition many times, you may wanna disable one index over other. You may not want that to be executed. So in such cases, just type is equal to, disable is very handy to use. So if you are doing any sort of troubleshooting for a given OCH index.

Okay, moving on to the next section, which I was talking about the rules. So this is, like I said, even in the beginning, that this is the core, this is the heart of any OCH index definition.

So that’s where you define first with this particular, or for any given OCH index definition, what note type you are handling, which means you are, for example, for this one, you are creating an OCH index for the note type damn-essers. Which means this would be responsible, this OCH index would be responsible to do the search against only for the damn-esser type of note files or notes you perceive. Not for any other one. However, you can have more types of note restriction within the given index rule for a one given OCH index name. That’s technically it’s possible, but ideally it is not always recommended. So it’s better to have one note restriction rule per OCH index so that your index is cleaner in a way that it’s taking care of only one type of note, not for other type of note.

Other thing is if in any way you have to keep it, you have to have multiple different note types within a one given note name, OCH index definition, sorry.

Then try to see that if that other note type restriction is related or not. If those are absolutely two different things, never ever have them indexed, never ever have them within a one index rule. It’s never recommended.

Okay.

With the index rule discussed, we’ll talk about the next section, which is mostly discussed and one of my favorite too, which is the full text search.

This is used a lot across any given application and powered by Apache Lucene AM is one of those.

So if you wish to search some keyword, that’s where the full text search is helpful for.

So before we really get into the nitty gritty of the whole text search, let’s take a look at these two queries on the top first.

If we see that we’re trying to search something from the CQ page type note, where it says contain weekend as a keyword, right? So what’s happening here is, first thing is if it’s a contains, which means it’s a full text search. That’s the first point to take care of, to note. Second thing is when I’m trying to search the weekend, it is not, I’m not searching it against any specific property. Rather, I kept it very broad, which means allow it to search against more number of properties, not just against one. And for that to happen, we have to have a node scope index flag set to true, right? This is what we call the node restriction full text search. I’ll repeat, node restriction full text search. On the other hand, if you want the full text search, let’s say for this keyword weekend against a specific property, right? Which means you know against what property I should do the full text search. For that to happen, it is always advisable to have analyzed flags be set to true.

What essentially it does is it creates the tokens of a given property’s value.

We’ll discuss these in a little more detail with their respective use cases. So we’ll be touching more on these two flags in particular, like analyzed and node scope index.

Moving on, these are some of the examples which we try to put in just to show you the fact that how they are mapped. For example, if you had this query, then what has to be your Oak index definition, right? So it gives you a one-on-one view into the query and the Oak index.

When it’s about the node restriction, where we don’t know about what property we are searching against, like I mentioned in the previous slide, node scope index should be set to true. So this is kind of a first need, right? There may be some cases where you really don’t want this broad level of full text search to happen. I don’t know, for some security reason, you don’t want some properties to be really available for node restriction, full text search. For example, any property which is holding some confidential data, like we ask you, or some ID or some account number.

So for such properties that is holding such sensitive data, you never want to have node scope index set to true. That’s one tip.

Moving on to the second type of full text search, which is the property full text search restriction, where you are trying to search for a keyword or a token, but this time against a one property only, or maybe multiple properties or a list of properties, but not the wide, broad full text search the way it was in the previous slide, right? And for that it to happen or for that it to happen really well is good to have, not good to have, you should have the analyze, no flags are too true, because this is responsible for creating the tokens for the content for a given property. Here it’s comment.

And not just about creating a content, it’s creating a token. It’s also responsible for the normalization of those tokens, right? Because maybe the data which is stored in these properties is not all lowercase, maybe it requires some stamming or some filtration. So all that happens if you have this analyze set to true.

And if you have this path, right? If you are doing the search against this path, it’s always a recommendable to have this flag as well set to true, evaluate path restriction. This is really, really good. It really improves the overall performance of your OCH index definition.

Okay, moving on, I do have a couple of, not couple of, little more couple of use cases discussing about the analyzed and the node scope index flags and how do they behave, when they are kind of used together.

Before even we start comparing them, I just wanna call out in bold words, it’s always a good practice to use both these together for a better full text search experience, right? But still, there may be some, you may not wanna use both them together, or maybe you wanna use it, but for the clarity of the concept, I’m just gonna cover some of these use cases.

Let’s assume there is one property JCR title, I don’t know, maybe Office SQL page or a damn asset or maybe whatever.

So this property is holding this data and I have applied these settings, analyze true, node scope index true to the OCH index definition, where that particular node type is supported.

So what it’s gonna do is, since it’s analyze is true, the first thing it’s gonna do is go on create the tokens.

And then it’s gonna normalize these tokens. You will see here digital is with the capital D becomes the digital with the small d.

All right, so when I try to do a full text search, the first one, let’s say this is node restriction full text search, so it would be able to find the experience token. Not just because it has it here, it’s because you have the node scope index also set to true.

Right, second full text search, where I’m doing a property restriction full text search, so it’s anyway provide you the result because anyway we are looking against the GC at title and we know that we have this token available. Okay, now let’s make a little change in the configuration. Let’s say we say we want analyze to be true, which means I want the tokenization normalization of the token to be done, but I don’t want node scope index to be set to true, which means if it is the case, then this node restriction full text search is corn potatoes.

So this is one of the use case I’ve seen working with many customers where such broad full text search does not happen without realizing the fact that node scope index is not set to true for a given property, essentially.

Okay, now let’s look at the third use case as well, which is absolutely the way around of the settings. When I say that analyze is false here, but I don’t mind keeping the node scope index to the true. What is the outcome? The first one being there will be no tokenization because for the given property I have not done because analyze is set to false, so it’s not gonna tokenize the value of that given property, which it is your title here, right? And if I try to search, which means this is, which is this one node restriction full text search, it will try to search for the experience, so it does not find the token, okay. Second thing is, does this, I mean, this text is stored as is for the index, but that’s not the case. It’s the one token called digital experience platform while we are searching for experience, hence they do not match.

And same is the case with this, your title, because it does not find any tokens.

Another one where you see that both these two are set to false, it’s absolutely expected that they may not return, both these two type of full text search may not return the result at all. But depending upon if you have some additional, you know, configuration as well for the given type or the given property on a given, you know, or index, they still may return the result, but within this particular scenario, where you are just talking about one property and the index, which is responsible only for the one property. So you get, can experience such results. So idea was just to showcase around the actual core crux of these two very, very important flags, right, responsible for having absolutely great or full text search in any application powered by Lucene indexes.

Next one, which talked about saying that, okay, if you have the exact, here if you have the title just experience, but you may expect this to work where it’s about the property search, right? Because in case your title is holding experience and I’m also searching for the experience, but it will fail essentially because their keys do not match here, experience it with capital E, here experiences with a small e.

And why did, why it did not normalize? Because it was never analyzed. Hence it was never normalized. Hence it is experience with capital E is indexed.

If you try to search with the same settings with the capital E, yes, it would be able to find it, right? And that is not a kind of a token search, which is created as part of these tokenization process, whether it’s the exact term search. I mean, these terms are different to distinguish in the beginning, but yes, idea was to cover these two flags in little detail because it was kind of necessary considering we have been having many success accelerator coming up where this was one of the topic which was discussed and we suggested upon.

Okay, looking at the next section of the OCH index definition configuration is the aggregation.

If I go back here, all right. So I was talking about this section of the OCH index definition where we have a lot of includes, right? So what essentially is, are these responsible? It’s really important. Many times I’ve seen many support cases raised saying that, hey, we have all the index rules in place. And for some reason we don’t result, I mean, the search result is not giving those, the pages or a damage set for that matter, right? So these are responsible for applying, see, for example, you see these properties, this is sitting in the metadata of NSF, but on the other hand, for example, this one, this is not in metadata. It’s right on the GCR content.

The point here is how the OCH index is gonna know that at what level deep it should go to search for a given term that is controlled through the aggregates in a very simple language.

So for that, we have this slide, which is kind of a visual representation of the same, where we have, for example, this image, which has a metadata with some properties, and it has a one child node, XMP history, which has further one more child, one, and that has a property called software agent and author.

Okay, now, if I, as a user, wanna search for a term called David.

So since this David as a term is sitting where exactly? It’s sitting inside GCR content, metadata, and then XMP history, and then node name one, which means I need to have a supporting expression as well.

And then you get to see this in a David. If you don’t have this aggregates in place, you may have all the required index rules and everything, but you will never get to this level to find your result.

Okay, moving on to the next section, which is pretty important, is about evaluate path restriction.

I’ll not go much into the detail, but all I can say is this is one of the most underrated, I guess if that’s the right term, underrated settings to make. And it is really helpful to improve the overall performance of your ORC index. And where it is used, consider it as, as in when and where you have path search, path restriction search to happen.

One example, if you have multiple microwebsites sitting on the same AM setup, where say content site A, content site B, content site C, you may not want to have one ORC index for all microwebsites. You may want to have a separate microsite-specific ORC indexes. So in that case, you can create multiple CQ page, node type ORC indexes, but for a given microsite. This is just an example of it, and this is really helpful.

Moving on to another type of search. So, so far we have touched upon the full text search for that record. So this is a property, losing property restriction. This is the most performant way of searching the content.

Well, the most, most performant way, which means it is not searching against number of, different properties or number of against different node. It is just searching for a specific property and that to be exact match.

Hence, it is very, very performant. So it’s always advisable to use this as much as possible if really makes sense.

Moving on to the sorting, this is again important where you can have it turned on, and to turn on, a couple of things you have to have. The first thing is order has to be true.

And the one, the common mistakes people make is don’t make such properties as property index, which means it is required. If you want to do the searching, that has to be property index set to true. Order, of course, has to be true, but property index as well.

Okay, so considering all that, right? So I will put together one, this big slide, which is covering all the best practices, which I’ve covered in last 10 or 15 slides. And there are some common mistakes one can make and how to avoid them. I mean, what are those mistakes to avoid, not how to? Okay, next one is about the performance. Okay, if you have to choose between full text search and the property search and within the full text search, two different type of search we already know, then in many cases, not in many, most of the cases, it, the property index is gonna excel. It is gonna be the best, gonna produce the best performance followed by the other one. And best practice is to use both these two together, unless there is a real reason for not using both these two together.

Okay, I’ll pause for a moment and just do one quick recap. We covered the basics of the Lucene search.

We cover how, what is the search flow, right? And then we covered the OCH index definitions building block. Then we covered some part of the queries terminologies, how those terminologies are used.

And then we looked into the full text search of both those two type, which means node restriction for text search followed by property restriction for text search. And then we saw the property search itself. All good, that’s good. Those were the core features, right? Now we are gonna look into some of the other features as well and most of them are pretty much out of the box available, but this is the one list. For example, the core features we have covered is the full text search, property search, path restriction search. We have covered all these. Oops, there could be a range query, right? So, or a binary extraction. If you wanna do some sort of a search against a PDF or the text in the PDF via Neteaca configuration, so that’s again one way.

That’s the way to put the search against document.

Other than that, there are other features as well comes pretty much out of the box, which are very, very useful for any customer to use. First one being the anonymous, sorry, synonymous and the stock words followed by stemming, normalization, custom tokenization, and then some performance area where we can make some optimization if required, boosting to improve the overall relevance of your results. If you wanna have more, you know, more than a UX experience to the user in terms of providing the auto-suggest or auto-complete, spell correction sort of things, or even the except, this is one of my favorite.

So you can do those as well. So idea is not to cover everything as part of this session, rather we have picked up certain number of such common search capabilities to talk about because they are used most. And that’s where most of the mistakes are made in terms of making a configuration for these capabilities.

Okay, I’ll start with the first one that’s pagination, right? So you see that pagination is of course, is a great user experience where you don’t have to show the entire search result in one go, or that you can have them in a specific number of pages. But the key takeaway here is in general, not in general, actually pagination do not require any kind of OAK index definition change. Okay, so this is one of the features which really do not care how your OAK index definition is defined or optimized, does not care at all. But yes, it matters if as part of the pagination, you have sorting feature as well, then you have to make sure that the fill based upon that you wanna do the sorting must be property indexed.

That’s the first best practice. Second best practice, I mean, these are very common features, sorry, flags, or I don’t know what to call them. But yes, limit, e-guess total or the offset. Try to keep them as minimal as possible so that you get a smaller set of result at one go, right? Otherwise, from the OAK index definition change standpoint, it needs no change.

Moving on to the second best feature, it’s about the filters and facets.

Okay, this part is very well, these two terms I have seen many times people using them interchangeably saying that they are same things. I mean, both these two are same, but actually they are not at core, they are different. For an example, for the filters, you essentially do a search for a specific, sorry, value, which means you are searching for either the status is published or not, which means if it’s a published or not published, right? But you do not care about how many of them are published versus how many of them are not published. If you really care about that level of information, for example, you see here, there are 670 images, five, sorry, 370 images and 354 are web-enabled. 16 of them are the vector images. So you see these count against these, right? We call them facets, right? But if it was a simple one where we say, do the filter based upon their, I don’t know, activation status, right? Or whether those are archived or not, yes or no. So if you don’t want the count against them, filter is the way to do it. Otherwise, facets is the way to have such, you know, a great level of information for that to happen. For a given property, you have to provide, you know, the facet configuration. At the same time, with the facet, you have to have the property index. This is must to know. This is good to know.

Okay.

Like you see, I’ve written one point here is, if property index is set to true, makes the property searchable and filterable, like here. If it’s a facet set to true, like here, it tells the OAK, I mean the Lucene engine to compute the count as well against the given value of that property.

Okay.

Moving on to our next feature, which is very useful for the end user because it really enhances the overall user experience, is the auto-sitters, which we many times see in the Google search as well.

So Lucene OAK Index, yes, supports that, right? Many times, you know, it is, it may be a little problematic in a way that performance may take a hit.

If it is enabled for more than required number of properties, right? Which means you have to be very precise deciding what properties I’m gonna enable for the auto-suggest feature, right? I mean, not just about the auto-suggest feature, one common tip just crossed my mind is, if you have 20 number of properties or metadata properties for a given type, for example, for the CQ page, you don’t really need to have the index rule for all those 20 properties.

So this is one of the common mistake, you know, we as a developer make. So it’s always wise to scan through all the properties and then see, maybe I don’t need all 20 properties to be a part of my OAK index definition. Maybe three or four should be good enough. If we do that, it’s good because that’s how you can keep, you don’t allow your OAK index to be blotted, right? If you wanna make it lean and concise, keep the number of properties within an OAK index as limited as possible.

That’s the first thing which I just wanna, in general, I’m gonna say. Second is about what all properties are really candidate for the full text search, because you may not wanna open some property, let’s say the usage, right? Or there’s a property with the name of copyright. You never wanna open switch properties for the open search for the full text search, right? So after this session, maybe you just wanna go back and then see how your OAK index definitions are looking alike and have them checked.

Okay, coming back to this topic, which is auto-sales. So for that to happen, all you need to do is you need to enable this property on a given property. When I say the property on a given property, I should rather say you should open this flag on this property so that it makes little more sense. Okay, once it is open, you can have a one JSON list created for you, which you can use as a developer on the presentation layer for yourself.

Okay, next thing is it is good to have it, but there’s one more feature available.

It’s out of the box available, but not out of the box enabled. You have to enable that, that’s En-GRAM, right? So it’s another filtration feature.

If time permits, I’ll try to cover that as well.

I still have some five minutes. So if you have En-GRAM as is enabled as well, I think that works very, very well with the auto-suggestion. It gives, it opens the search spectrum pretty much for the end user.

Very, very powerful, it becomes very powerful for the end user to find the required assets.

Moving on to the next commonly used feature is the spell check.

So it’s about if you want to just make sure that the user, when they are typing in your website or a portal, if you want to suggest them help them saying that, hey, did you mean this, not that, that you can also do. And this is again only for your user experience. Make sure you don’t enable it for any random user. Any random or any unwanted property for keep it only this size unlimited to solve the aspects like maybe description or the title or beyond that.

Okay, moving on to the next feature, just the boosting. I mean, this is one of the most commonly used feature. I think out of other features I talked about in a couple of previous slides. If you really want to, as a business owner or as a developer, you want to allow certain results to appear on the top of the search, this is the way to have it. You can do the boosting, which means you can add the boosting to the specific properties so that, because when you do a context, sorry, full text, full text node restriction search, in that case, you may be searching against, let’s say 10 properties, right? Maybe JCR title, maybe NEF title, maybe description, but you want to make sure that when it is doing that search, first it should search against the JCR title and then on the any other properties, right? So if you want to maintain that level of priority you want to make, you can do that by setting this boost value. And the fact that you do it on a relevant properties only don’t go beyond so many properties.

Synonyms is another part where you want to have a list of words where you don’t want some of the search, some of the terms to be searched, right? I’m sorry, I’m really sorry. That’s about the synonyms where you want to have a different name for a given keyword. This is something you can do it for.

Stop word is another feature where you can decide what words or the tokens you don’t want to do the search against, right? As a, the, in, at, by, this and that. You can do that, but many times there may be some business words you never want them to be searched. You can put them into the stop word.

Stemming is one big feature which I like most is if you want to broad your full text search, use the stemming because it brings the overall keyword to its root form. And then it, you know, enhance the search, you know, features a lot. For example, if you have running is tokenized and then indexed, it may become the run and hence it would be able to search. Doesn’t matter if it’s a running, run, runner, rain, which means with this running content, you can even search against runner and rain as well. So this is pretty good.

It’s really good if you use it even with the end gram.

Okay, I’ll be just stopping in the next one minute. I know that we have five minutes to go. End gram, this is really a cool feature. I would say it’s a wild beast animal sitting in the AWK, Lucene AWK index definition.

This is really powerful and it’s really maybe a problematic if it is really not configured carefully. Okay, why, okay, before that, what does it do essentially? So if you have, let’s say some token called polymer, right? And that is saved somewhere in the JCR title and somebody wants to search for it. Maybe they have to type the full word, right? So for that, if they want a partial search to happen, right? So we can create a multiple token, right? The multiple end grams of one keyword.

Now before what we have seen is if there were multiple words, then they become the token. But if you want to create the multiple token out of one token, that’s where the end gram shines, right? You see that, let’s say there’s a polymer and I have a configuration for the size four to 10, which means bare minimum, let’s create the token with the length of four character to the max 10, right? So you see that it end up creating these many tokens, which means if user do not type the polymer, all the type is poly, they should be able to find it. Synonyms of course can be one good option as well, but then it’s about your imagination, how strong your anonymous list can be. But end gram is really cool thing in that way. I mean, this is the one topic in itself to talk about, but yes, just a cool feature, try to use it, but very wisely because it behaves very, it does not behave well when you use it with a wildcard, which I have seen.

All right, I’ll just finish very quickly saying that you see all these features which I just talked about, and then their respective performance rating, most of them are either high and moderate, but you see the end gram goes to low.

This is, can be problematic if it is really not handled well.

Okay, demo section I think I won’t be able to cover since we are pretty much top of the hour. The one JCR query, the cheat sheet I have in place, this is pretty old one, but still that old is gold.

And then this is the workshop I was talking about. Feel free to browse through this GitHub and try your hands on it, followed by some of the resources and the readings for you guys to read into pages.

Yeah, I think I’m pretty much top of the hour.

Cool, thank you so much Vineet. That was really comprehensive and incredibly valuable content.

I know we did not have a lot of time, but we have a quick request. So we have a quick three question poll right now to get your feedback and help us shape future sessions. The poll should be only taking 30 seconds to complete. So I’m launching the poll now. You should be able to see it on your screen. We would really appreciate if you can fill the polls.

We’ll wait for 30 seconds to do that.

All right, so I know there has not been, I think any questions in the Q&A pod.

I hope things are really clear as it was a comprehensive presentation. Before we close, I’d like to extend a huge thank you to our presenter and thank you to all of you for joining us today. Have a wonderful rest of your day and stay ahead in the digital landscape. Goodbye everyone. Thank you. Thank you, bye.

Unlocking Powerful Search in Adobe Experience Manager

Adobe Experience Manager (AEM) leverages Lucene search to deliver fast, relevant results across content, assets, and metadata. This session explores how Lucene indexes work, how to configure them, and the best practices for maximizing search performance.

  • Lucene Search is Everywhere Powers search in AEM author, publisher, and portals, handling auto-suggestions, filters, facets, and pagination.
  • Index Definitions Drive Performance Customizing Oak index definitions is crucial for efficient, targeted search.
  • Best Practices Matter Copy existing index definitions, limit indexed properties, and use the right flags for full-text and property searches.
  • Advanced Features Enhance UX Facets, auto-suggest, spellcheck, boosting, and stemming can be enabled for richer search experiences.

Understanding these principles helps ensure stable, high-value search capabilities in AEM, supporting both technical and business goals.

Lucene Index Building Blocks

AEM Lucene index definitions are the foundation of search performance and accuracy. Key components include:

  • Type Specifies index kind (Lucene, property, etc.).
  • Node Type Restriction Targets specific content types (e.g., dam:Asset, cq:Page).
  • Path Restriction Limits indexing to defined repository paths for efficiency.
  • Aggregate Rules Controls depth and scope of indexed content, ensuring relevant properties are searchable.
  • Index Rules Core configuration; sets flags like nodeScopeIndex (broad full-text search) and analyzed (tokenization/normalization).

Careful configuration of these elements ensures that search queries are fast, relevant, and resource-efficient.

Optimizing Search Performance

Effective search optimization in AEM Lucene involves strategic configuration and adherence to best practices:

  • Start with Existing Indexes Always copy and modify out-of-the-box definitions, never build from scratch.

  • Limit Indexed Properties Only include necessary properties to keep indexes lean and performant.

  • **Use Flags Wisely:

    • nodeScopeIndex true for broad full-text search
    • analyzed true for property-level tokenization
    • evaluatePathRestriction true for path-based queries
  • Property Indexing Prefer property restriction searches for best performance; use full-text only when needed.

  • Sorting & Facets Enable propertyIndex and order for sorting; set facet** true for count-based filtering.

Applying these strategies leads to faster queries, reduced resource usage, and more relevant results.

recommendation-more-help
abac5052-c195-43a0-840d-39eac28f4780