Tales from the front lines - how to avoid common pitfalls in Experience Manager CS
Learn from the most common mistakes that others have made so that you can avoid them yourself! In this session, we will look at some of the common issues that our on-call and support engineers have seen and discuss best practices for ensuring that you can avoid these in your own Adobe Experience Manager as a Cloud Service implementation.
Continue the conversation in Experience League Communities.
Transcript
Hello everyone, good afternoon and welcome to this session of the Developers Live Conference Tales from the Frontlines, How to Avoid Common Pitfalls in Adobe Experience Manager as a Cloud Service. My name is Ian Reeser, I’m a Senior Computer Scientist on the AEM Assets team and with me today we have Yor Ko who is a Senior SRE on our Foundations team. Today we’re going to be talking about some of the hard earned lessons that we’ve learned or that our customers have learned in their migration to AEM as a cloud service. Jorg and I have been on the front lines as on-call engineers helping to keep the service going. In preparing this session we’ve also talked to some of our consultants and some of our support engineers in terms of some of the things that they’ve learned as well. And the goal really here today is for you to learn from mistakes that other customers have made so that you can avoid those mistakes yourself and not have to go through the same painful learning process that they had to go through. With that I’ll pass it over to Jorg to get us started. Thank you Ian, welcome everyone. So let’s jump directly into the topic, let’s deal with our first thing which is our good friends of the index definitions. So index definitions are, well, there for quite some time. You probably know them from AEM 6 and basically you never had any restrictions. It means you could drop your index definitions wherever you liked, you could define them however you saw them fit, you could modify product indexes. The only concern was that it was somehow on your next upgrade you have to take care of them and make them compatible. And there was even some tooling around which allowed you to deploy indexes during runtime using ensure-oke-index and it helped you to overcome some inconveniences when you were adding or updating indexes. This is a little bit different in cloud service because there were now much more restrictions there you need to follow namely naming and the location where you drop your indexes and how to customize them. And there is also, it has changed in a way that index definitions are not applied anymore during runtime therefore ensure-oke-index is no longer useful and can no longer be used at all for this. And as a goodie on top, which is something which has been edited, which was also a hard-earned lesson was validation of these definitions. In previous versions was easy to modify them when you saw well it’s not working well or it’s misbehaving even but here now the cloud manager will validate at least that you adhere to some minimal and basic standards so you don’t break your AEM instance. And to actually help you to migrate to AEM as a cloud service this index converter tool is available which checks your existing indexes from AEM 6.x and will do some recommendations some rewriting and it gives you an index definition which is compatible with AEM as a cloud service. So this is a highly recommended tool when you’re migrating to AEM as a cloud service from an either on-prem or AEM based AEM instance. So the next thing is the session affinity cookie. This is probably something I’m not sure if you’ve ever heard this but maybe you’re familiar with the term session stickiness which is pretty much the same thing. So basically when we come to our AEM instance to mostly the author instance we go through a load balancer and then the load balancer sends us to some more or less random AEM instance and then we do our operation on the repository. The problem now that the whole thing is eventually consistent that means makes a difference if I made if I do the first operation on the AEM instance 2 and then the second instance then the read on AEM instance 3 or on AEM instance 2 because there is a sync delay of up to two seconds meaning when I write on AEM instance 2 and try to read the thing which I just wrote on AEM instance 3 it might not be there yet. To overcome this problem a session affinity cookie has been added which just tells the load balancer well Jorg always goes to AEM instance 2 for the duration of those as long as this cookie is valid which allows us to overcome this problem. So why is it a problem? Is it a problem at all? Yes it can be a problem because the use case you write something and you now expect it to read and it’s not there therefore you have to implement it when you do multiple requests to your AEM instance. And our AEM as a cloud service infrastructure does this by default on every request which does not have that cookie it sets one but of course when you are doing this and you’re firing just random curl requests to your AEM as a cloud instance it’s probably not working that means you’re just you get every request you get a session affinity cookie and that’s the thing which you need to follow and that’s the guidance that means when you get such an affinity cookie just reuse it. That means if your first curl request will get such a cookie and then the subsequent call requests should reuse that cookie so your requests always end up on the same instance. The good thing is the Adobe tooling is already respecting it that means if you use AEM upload or other tools and programs Adobe provides to access AEM as a cloud service they’re all following this and they’re not affected by this but this is something you need to be aware of when you are building a type of auto either automation or a connection from external to your AEM as a cloud instance. Then another thing which we learned is very important that means functional integration and UI tests. Why is it important? Because most of all it’s industry standard to write these type of tests and unit tests are there for quite some time in the AEM world and we have some good frameworks in there but now integration tests are also kind of considered first class citizens in the world of AEM as a cloud service because these tests are fully integrated into the deployment pipeline and basically they’re easy to create them. There are some resources available which we share at the end of the session and which we posted into the experience league forums post and that’s where you can get these links. But these tools, these integration tests are fully integrated and they’re always executed on the stage deployments and they allow you to test end-to-end functionality and you have the full AEM as a cloud service environment at hand. That means you have an authoring cluster with multiple nodes, you have published, you have an asset compute service there so basically you can test everything you will have in production. Right now we support two test frameworks. First of all the AEM integration test framework for these functional end-to-end tests and then we also support Selenium tests for visual tests. That means we are providing the test drivers, we are providing Docker images, whatever is necessary for you to run Selenium tests. It’s all there and it’s all documented and all integrated into the AEM as a cloud service pipeline and into the AEM as a cloud service environment. So and the final thing from my side is that we have here the Dispatcher configuration. Dispatcher configuration is a little bit like index definitions. It means you have in the AEM 6.x world you could do what you want. You had an Apache HTTP server, you could use all the modules, you could operate it as you would see fit, you could create your configuration, deploy it as you would like to, you could automate this in the way you are familiar and which fits into your operational processes. And also you could use your configuration structure which you were comfortable with. And the same with the index, it’s the same here, it’s the same as the same as the index, it’s the same with Dispatcher configuration. Because now you need to adhere to standards. That means you are, we are, the standard says you have to deploy your scripts into this structure. You have to name files like this and the structure must be deployed in this way. And an interesting thing is also that sometimes we have seen cases where an incorrect Dispatcher configuration actually broke integration tests. And that’s the reason why we asked you to pay a little bit special attention to the Dispatcher and the Dispatcher configuration. And to make these things a little bit easier for you, we have developed the Dispatcher validation tool. And combined with Dispatcher as a Docker image, it gives you the chance to validate your whole Dispatcher configuration locally. That means you can deploy your configuration to your local Dispatcher Docker image and you can test it with whatever tool you like to do. And you can even run it for your local test in front of your own locally deployed AEM instance. So my recommendation is really to start adopting and using Dispatcher validation tools because that’s a very easy one. And basically we’re using the same validation as part of the cloud service deployment pipeline. Means it’s your if the configuration passes your local tests, it will also pass the cloud manager pipelines. Okay. Ian, what do you want to take over? Yeah. All right. So to talk about some of the lessons that we’ve learned in terms of AEM assets, it would help to start by talking about some of the changes that we’ve made to asset uploading and processing in AEM as a cloud service. If we go back to AEM 6.5 and older versions of AEM, we had this model of asset upload where a client, be that the AEM Y or maybe the desktop app, would upload the asset into AEM, which would sync it with the back end cloud storage provider and then do any processing inside the JVM, usually inside of a workflow. This created some scalability bottlenecks for us. The first one being bandwidth limited by the size of the virtual machine. So basically, you know, say you’re running in AWS, for example, the size of the VM that AEM is running on is going to constrain how much bandwidth you have to upload an asset into that virtual machine and how quickly the virtual machine can then put that asset into the back end storage. The other piece of that also limited by the VM size is the amount of IO and CPU resources available to you to do asset processing. So if you upload a single asset, maybe you’ve got plenty of resources to go and process that efficiently. If, however, you’re doing something like an asset migration and you’re uploading a million assets that can often overrun the resources available to that environment. And we saw that some customers had to take quite a long time to onboard to AEM or that we had to upsize their environments to the biggest machines we could possibly get just while we brought them on board. When we moved into the cloud, we said, you know, we really want to address these bottlenecks and create something that’s more scalable, both for onboarding and for use cases throughout the lifecycle of an AEM customer. In part because in the cloud, these AEM environments we run are very small. And we wanted to make sure that we didn’t have to size up to some enormous AEM environment just for the purpose of asset processing. And so the way that we do that is by fundamentally changing the way that we do asset upload and processing in AEM as a cloud service. In the cloud, we start by requesting an upload. The client requests an upload for me and just as it would otherwise, except it’s not actually sending the asset here. It’s just requesting an upload. AEM then reaches out to the binary cloud storage provider and generates a pre-signed URL that it returns to the client. The client then uploads the binary directly to that URL. So the binary never actually goes through AEM. You remember it may remember that was one of our first bottlenecks was the bandwidth between the client and AEM. Since that asset is now no longer going to AEM, that bottleneck is no longer a problem. Once that asset’s been fully uploaded, we call the complete upload servlet in AEM to let AEM know that we finished uploading that asset. At this point, AEM evaluates processing profiles for the assets, which we’ll get to a little bit later, and then it requests processing from the asset compute service. You may remember that one of our other bottlenecks was processing power inside of a single JVM. Well, in this case, we’ve implemented a microservice to do all of that asset processing for us, things like generating thumbnails, extracting metadata, etc. Now, when the asset compute service gets this request, it’s going to reach out again directly to the binary cloud storage provider to get the asset, generate the renditions, and then put them back into binary cloud storage. Again, no binaries passing through the AEM Java virtual machine. Once it’s completed all of its processing, it’ll notify AEM that it’s done. And once we’ve gotten back all of the renditions and extravaled metadata extraction, etc. that needs to be processed in asset compute, AEM will look to see if any post-processing workflows have been configured. And if they have been, it’ll run them at that time. So with that established, let’s take a look at some of the ways that we’ve seen customers run into trouble under this new paradigm. The first being when they have failed to switch from using the old create asset servlet or the old APIs that we would run within the JVM. They haven’t switched from those to the new AEM upload tool that we provided. Now, you remember AEM 6.5. It’s a quick, easy client uploads the asset into AEM, one step, and you’re good to go. However, in our new approach, you need to request your upload. You need to wait for that signed URL to get returned, then upload to the URL, then complete the upload to AEM. Easy, right? You may be saying, do what now? This does not seem all that simple. There’s a lot of steps. I’m going to have to write a lot of code to manage all of this. And we heard you. So we built out a tool for you. It’s called AEM upload. This is an open source Node.js utility that’s available to you via NPM. You can see over on the right here, just some sample code. It’s pretty straightforward. You’ll notice that this code doesn’t have you go through and do multiple steps. It just says, hey, here’s my file. I want to upload it. And then the library manages all of that for you, requesting an upload, uploading the file to the data store, and then posting the complete message back to AEM. In case you’re not using Node and you want to use this at the command line, maybe for a shell script, we’ve also integrated with the AIO AEM command line utility so you can run this as a command line utility as well. And if you use this, you can rest assured knowing that you’re using the same code that we use in the product. All of the Adobe clients, be that the AEM UI, be that asset link, et cetera. In all of these places, we use this same Node library and actually have it embedded into those code as well. Our next point is you should make sure to use post-processing workflows instead of workflow launchers. Now, if we think about asset processing and older versions of AEM, your client uploads the asset to AEM. At this point, a file gets created in AEM and that kicks off a workflow launcher that listens for those file creation events. And that workflow launcher runs the damn update asset workflow. Many customers have gone in and customized out of the box steps in this workflow or maybe added their own steps to it. And then when they move on to AEM as a cloud service, they say, hey, I’ll just leave everything as it is. I’ll migrate my content and I’ll be good to go. Right. Well, the issue is that. And AEM as a cloud service, our asset processing is not being done in this damn update asset workflow, it’s being done in the asset compute service. So the problem that we run into is that your file gets uploaded, your workflow gets kicked off at the same time, asset compute is out doing its thing. And so perhaps you’re trying to modify a file at the same time that a rendition is coming back from asset compute. Maybe you run into conflicts or perhaps your workflow step expects that all of your renditions have already been created or your data has been extracted. And that hasn’t happened yet. And so we really want to make sure that we’re able to say, hey, I’m going to upload the asset into AEM. I’m going to go do my asset processing. And once that’s done, I want to run the run my custom workflow. And so what we’ve done here is we’ve basically built out a different type of process where after asset compute comes back and it says, hey, I’m done processing. We then run your custom code in this post-processing workflow. Now, a note for you is that in your custom post-processing workflows, don’t forget to include the damn update asset workflow completed process. The reason for that is that in the old model of doing things, we would know that, hey, when this damn update asset workflow is completed, we’re done processing. There’s one point in time and we’ll mark the asset as done, as completed. And in the UI in AEM, you’ll see that little processing label disappear at that point. However, in AEM as a cloud service, since we’ve split up the out of the box processing that happens inside of asset compute with your custom process and it’s happening in the post-processing workflow, we don’t want to mark the asset as completed until it’s done with all processing, including customer processing. And so the way that we do that is by not marking it as processed in asset compute if a post-processing workflow has been configured. In those cases, we rely on you to have this step included in your workflow to mark the asset as completed when you’re done. Now, if you do have the need for a post-processing workflow, there’s two ways you can configure this. The first way and the simplest way is just read the folder properties so you can go into any folder properties inside the dam. Take a look at the auto start workflow dropdown you’ll see over on the right here and you can set your workflow there. It’ll run it on all assets in that folder in subfolders. Now, again, we do require that you use that workflow step I previously mentioned. So if you’re not seeing your workflow here, it’s possible that it’s because it’s missing that step. Now, we do recognize that some customers have maybe a more complex need for how they configure their workflows to run. Maybe you need to use something like a regular expression to say, well, I only want to include certain subfolders or things of that nature. Or maybe you’re doing an automated migration, which we’ll get to in just a moment. In those cases, you may want to use this OSGI configuration, the Adobe CQ dam custom workflow runner. However, note that since this is an OSGI configuration, anytime you want to modify this, you’re going to have to do a code deployment, which is just a little more complicated and not quite as responsive. So for most customers, we’d recommend the folder properties. But there are use cases where you may want to use this OSGI configuration. Our next step is to make sure to migrate your asset processing workflows. Now, the workflow that you ran in AM 6.5 or earlier may not run as you’d expect, as you’d want in AM as a cloud service. To start with, let’s think about how we request something like custom thumbnails in AM as a cloud service. So in AM 6.5, you would go into the dam update asset workflow, you would find the create thumbnail workflow step, and maybe you’d add a custom configuration to it to say, hey, I want a JPEG that’s 150 by 200. That’s an additional configuration. Well, in AM as a cloud service, we’re handing all this processing off to the asset compute service, which doesn’t run in the workflow. And so to do that, you’d want to create a processing profile. You can see an example of what a processing profile looks like on your screen here. In this case, we’re saying, hey, give me an FPO rendition, just a low quality, full size rendition, and maybe I want a custom thumbnail as well. Now, we understand that going through and taking all of your old configurations and migrating them over to processing profiles is work and time, and we want to make things as easy for you as we can. And so to that end, we created a workflow migration tool. Now, this workflow migration tool is available on GitHub. You can just go out and download it. Once you’ve downloaded it, it’s a simple JAR file. You can just drop it in anywhere you’d like, and then run it via java-jar and point it to the root of your Maven project. So this is your source code. It’ll go through and automatically migrate all of your workflows for you, and then it’ll generate a markdown report for you to let you know everything that it’s done. To give you an idea of the things that this tool is doing, it’s going to start out by creating Maven projects for any artifacts that it’s going to create. It then will disable any DOM-related workflow launchers. If it determines that any of the workflow models that were called by those launchers will still need to run in AEM as a cloud service, it’ll configure the custom workflow runner to run them, and it’ll transform them by removing any steps that aren’t supported. For any of those steps that have corresponding workers that can run inside of the Asset Compute service for thumbnail generation, it’ll create custom processing profiles. And then again at the end, it’ll create that migration report to let you know that it’s done. And with that, I’ll hand it back to Jörg to wrap us up. Okay, so our recommendation to you is when you are onboarding into the cloud service, and you’re coming from an existing AEM project, be it AEM on-prem or AEM as a cloud service, we really, really recommend you to look up through the tools and browse through the GitHub repositories we provide to you. We will provide you the whole list of links in the experience link and forum post as well, so you can check there. But the thing is here, don’t try to do these changes all manually, because we have tooling. Why should you spend the time and figure out all the things we are already implemented? So please, please leverage the available tools there. We build them based on the learnings we have made to make your work easier when you plan to come to AEM as a cloud service. Next slide, please. And so that’s it. Before we will conclude here, I want to point you to our experience link, where you get all these replays of these sessions, and you find a lot of courses, tutorials, and documentation, and also the product specific communities. I think you can find me on the ExperienceLink AEM forum, probably Ian as well. Okay, and that’s it. So I will post the link into the chat, or maybe, Ian, can you do that? It’s actually already there. It’s at the top of the session chat. Right. That’s cool. Yes. So now let’s check for any questions. So I do see that we have a handful of them. The first one that came in was, can we run the functional test separately through browser stacks? Yes, I think you can run them through Selenium locally. I mean, you can configure your local Selenium to run against cloud service. That’s totally possible. And that’s doable. It’s just that we provide this piece of infrastructure already at the stage pipeline run. So we will include it there. But of course, you can run it on your own as often as you like against any environment you would like to run it against. Domenico asks if the Affinity Cookie behavior is a solution that you would only need to write if you need to write something programmatically on the author and publish instance and then do another operation during the same process like a server would call? Yes, yes, you need to make sure that you respect the Affinity Cookie only when you do multiple requests from outside of AM into AM. It means if you want to do two or three or how many calls to AM calling serverlets. In that case, you should do this just to be on the safe side. But if you’re just using the AM UI for things that we support out of the box, you would never need to do this. We implement this already in the AM UI. So it’s really only in your custom scripts that you’re writing. Exactly. So Mira is asking this AM upload library would be used when onboarding the customer when they have large asset repos? Not necessarily. In previous versions of AM, we really put it on our partners and our consultants to say, hey, you know what, a customer’s got a million assets, you figure out how to load them into the system. We’ve actually in AM as a cloud service added bulk import functionality for you that makes it really easy for you to take a large assets repo and ingest it already. So this would really only be necessary if, for example, you needed to connect to some third party way of pushing assets into AM. Or in older versions of AM, you may have been pulling those assets into AM, which you can’t really do anymore. And so maybe you have assets coming in from some third party service, you need to pull them in every so often and push them into the repository. It’s use cases like this where you’d want to use the tool like that. Okay, thanks. I think we are at the top of the hour. We will follow up with all remaining questions in the forum and answer them there. There are three questions, Nick, Abdul-Ashid, and Mir. We will come up and follow up with the questions you have in the forum. Okay, so thank you for joining the session. And I really like to wish you a nice session and a nice developer conference. Thank you for joining. Thanks, everybody. Thank you.
Additional Resources
recommendation-more-help
3c5a5de1-aef4-4536-8764-ec20371a5186