Troubleshooting AEM as a Cloud Service

Learn how to troubleshoot and debug the AEM SDK, AEM as a Cloud Service and, the build and deploy process.

Hey, everyone. My name is Kunwar Saluja, and welcome to this video. Today we’ll talk about how we can troubleshoot, various aspects of AEM as a cloud service. After completing this video, you should have a general idea on how you can debug locally AEM SDK, debug AEM as a cloud service environment, and debug build and deployment failures for your cloud manager executions.
Debugging AEM SDK. As we know, AEM SDK is the primary development environment used by developers. It also supports multiple ways to debug AEM and the deployed applications. We’ll go through some common debugging tools and consoles that should help in the troubleshooting. Logs generated by the AEM SDK can provide key insights into debugging AEM applications. They are available under the crx-quickstart/logs folder. Logs act as the frontline of debugging AEM applications and can help resolve complex issues with AEM, but they are dependent on adequate logging levels set in the deployed AEM application. Adobe recommends keeping the local development and AEM as a cloud service development environment logging levels as similar as possible, as it normalizes the log visibility and avoiding discrepancies for both environments. If default logging is insufficient for local development, custom logging can be configured via the OSGi console. AEM SDK also has an OSGi web console that provides a variety of information and introspections into the local AEM runtime that are useful to understand how your application is recognized and functions within AEM. AEM provides many OSGi consoles, each providing key insight into different aspects of AEM, such as bundles information, configuration statuses, and more. OSGi consoles can be accessed from within AEM under Tools, Operations, Web Console, or directly through linked system console. OSGi consoles provide details in different debugging scenarios, such as validating if an OSGi bundle is present, validating if an OSGi bundle is active, determining if an OSGI bundle has unsatisfied imports preventing it from starting, identifying if the OSGi property values are bound to any active OSGi configurations. Other areas of the console help you with the functions such as server resolution, resource resolution, package dependency resolution, and more, that can help ease the debugging effort in AEM. Another common used tool is CRXDE Lite. CRXDE Lite is a web-based interface for interacting with the AEM content repository. CRXDE Lite provides complete visibility into content repos, including notes, properties, property values, and permissions, which can be helpful in troubleshooting various type of issues related to code and content. For any content or permissions-related issues on your local environment, these tools come in very handy. CRXDE Lite can be accessed within AEM under Tools, General, CRXDE Lite, or can be directly accessed with the URL crxde_index.jsp. The dispatcher tools are now part of the AEM SDK. They provide a containerized Apache Web Server environment that can be used to simulate AEM Cloud Service Dispatcher set up locally. The latest AEM SDK can be downloaded from the software distribution portal. Setting up dispatcher tools and viewing the logs and the cached content can be vital in ensuring the end-to-end AEM application functionality and ensure if all the security configurations are correct or not. As these dispatcher tools run in a containerized environment, you can access the log and the cache using the Docker shell. It’s pretty easy to set up and can help ease the debugging effort with the content delivery on the publish service through the dispatcher. So in summary, we install Docker, validate and run your dispatcher config through the local dispatcher SDK, best end-to-end delivery. In case of any problems, dig deeper using the dispatcher log and the content cache generated for your config to troubleshoot the problem at hand. I’ll share the reference link at the end of the session, that would help you add on dispatcher SDK.
Debugging AEM as a cloud service environment. As we know, AEM as a cloud service is now the cloud-native way to leverage AEM applications.
This runs on a self-service, scalable cloud infrastructure, which requires AEM developers to understand and develop various facets of AEM as a cloud service from build phase to deployment phase and getting all the details of running AEM applications. Like the AEM SDK, logs provide us details on how your application is functioning in AEM as a cloud service, also provides insights into issues with deployment. All the logs generated by the author of publish services are exposed via cloud manager and are available for download.
All log activity for a given environment’s AEM service are consolidated into a single log file, even if different parts within the service generate the log statement. AEM author service provides AEM runtime server logs that include error, access, and request logs. AEM error log is the Java error log for AEM, similar to local SDK under the quickstart folder.
AEM access is the log with all the HTTP requests to the AEM service, AEM request is the log with all the HTTP requests made to the AEM service and the corresponding HTTP response. For AEM publish service, in addition to the error, access, and the request log, the Apache Web Server and the dispatcher logs are also available for download. These include HTTP access, HTTP error, and AEM dispatch log. HTTP access is the log with all the HTTP requests made to AEM special or dispatcher. The HTTP error is the log with all the messages from Apache Web Server and helps with the debugging of supported Apache modules such as mod read-write, mod security, et cetera.
AEM dispatcher is the log with all the messages from the dispatcher module, including filtering and serving from the cache messages. Adobe recommends to set the logging level for dispatcher log in the dev environment to be in debug or trace mode such that in case an issue arises, we can do an in-depth analysis and understand the cause of the issue. AEM as a cloud service supports system logging, but currently does not support system log files, so all custom or project-specific logs can be configured and logged in the error.log, and this would be available in the cloud manager. Adobe Cloud Manager also supports testing AEM in the cloud service log via the cloud manager plugin for Adobe I/O. With this, you can download and tail the logs for all your environments using the command line interface. You can also use the cloud manager environment variables to parametrize the log level.
With this, you can allow log levels to change dynamically, and this can be done using the aio cli plugin.
Things to know about using the environment variables is that they are limited in number and there is no UI and these can only be set using the cloud manager API or the aio cli plugin. Customers who have Splunk accounts have an option to request that their AEM Cloud Service logs are forwarded to the appropriate index. The logging data is equivalent to what is available through the cloud manager log, but customers may find it convenient to leverage the query features available in the Splunk product. Splunk forwarding for the sandbox program environments is not supported. To enable Splunk forwarding, customers should submit a support request along with the Splunk HEC end point, the Splunk index, the Splunk port, the Splunk HEC tokens, to enable this integration with the customer Splunk account and AEM as a cloud service. Each AEM as a cloud service environment has its own developer console.
This exposes similar information exposed in the OSGi console in the local AEM SDK. Access to the developer console can be enabled by assigning the user with the cloud manager product developer profile and align this user with the AEM user or administrators product profile in the admin console. CRXDE Lite is also available in your AEM cloud service dev environment. It provides direct access to the JCR. This is really helpful in debugging content and access control related issues. For Debugging OSGi configurations, it is recommended to use the developer console instead of CRXDE. Note that the content path, like apps, links or connects are visible in CRXDE but are immutable, meaning they cannot be changed at runtime by any user. These locations in the JCR can only be modified via code deployment. You would also use CRX package manager, available for all AEM author environments, to extract the content in form of packages and analyze it offline. Let’s do a quick demo to showcase the developer console. As mentioned earlier, access to the developer console is through cloud manager UI.
So once we log into the cloud manager UI, you can select any of your AEM environments and access this developer console. So you should be prompted to log in using your Adobe credentials. So once you’re logged in, you can view all your available tabs in the developer console. Using the pod drop-down, you can select any of the pods from your author service or the publish service. You can capture and download status information around bundles, competence, configuration, auth indexes, sling jobs, and more. You can also use the package resolution or the resolution tab to capture information on how your package or the servlet is being resolved in AEM. The queries tab takes you back within AEM to help you with the analysis on the query performance. The integration tab provides you with local development token or the service credentials that you can use to authenticate AEM using the third-party applications. Debugging build and deployment failures. As we know, both build and deployment activities to AEM cloud service are done through cloud manager’s pipeline applications, but fhe failures may occur during these steps in the build process, which might require actions to resolve them. We’ll go through some cloud failures in this deployment cycle and how you can best approach them. The first step in the cloud manager’s execution is the validation step, which simply ensures that the basic cloud manager configurations are valid. There could be a scenario where once you start the execution, the execution fails in the validation step, suggesting that the pipeline execution can be started and the environment sits in the invalid state, which is reflected in the cloud manager UI. The reason for this is that the target environment of the pipeline is in a transition state and it cannot accept new builds. This could include the environments getting created or deleted. I want you to wait for the state to resolve and then retry to start the application.
As we know, the deployment pipeline executions are tied to the environment and the git branch, so there could be situations where pipeline executions can be started, as the environment or the git branch the pipeline is configured to is already marked as deleted.
One needs to edit the pipeline configuration and reconfigure the target environment or branch before retrying execution. The next step in the cloud manager is build and the unit testing.
They performed a Maven build, that is, a Maven clean package command for the project checked out from the pipelines will match, errors identified in this phase should be reproducible by building the project locally and can be fixed on the local dev environment. And once fixed, we can really apply this. Get past the step. This step won’t be able to identify any issues arising from unavailable or are unreachable dev and dependencies using the project code or using a private internal Naval repository, which is not accessible. It also doesn’t identify any issues with the unsupported issues in the project. Next one is the code scanning phase that performs a static code analysis using a mix of Java and AEM-specific perspective. Code scanning results in a big failure if critical security vulnerabilities exist in the code. Lesser violations can be overridden, but it is recommended that they are fixed. Note that code scanning is imperfect and can result in false positives, which should be analyzed further. To resolve the code canning issues, review the summary and download the CSV report provided by the cloud manager under the download details section, and we can take the action accordingly. All the activity in the code scanning phase is logged and is available for download from the cloud manager UI. Next is the build image phase, which is responsible for combining the build code artifact, and it’s created in the build and unit testing phase, along with and this forms a single deployable artifact. Well, any code built and compilation issues are found during the building and testing phase that it could be configuration or structural issues identified when attempting to combine the custom built artifacts. So there could be situations where the pattern execution fails in the build images step, which could be due to the malformed repoinit scripts, its usage of multiple version of code competence, and more. These issues are logged in the build image log and is available for download from chart management For a malformed reponint script, We need to ensure that the directives in the script are defined correctly. this can be reproduced on your local AEM SDK and should be reflected in the error log. Other common case is the usage of core competent versions which is greater than the deployed version. AEM as a cloud service automatically includes the latest core components version in every release, meaning after an AEM as a cloud service enrollment is automatically or manually updated. The latest version of core components is deployed to it. To prevent this failure. whenever an update for AEM as a cloud service environment is available, this should be included as a part of the next build in deployment. And always ensure that the updates are included after implementing the core competent versions in the application’s code base, The deployment step is responsible for taking the code artifact generated in the bill limits step starts up a new AEM author in a publish service and upon success deploys it in the form of in the cloud. No doubt the log available by the download log button in the deploy step is not the and does not contain any detailed information pertaining to your application startup. It only contains logs for the process while it’s deploying the artifact. The AEM error log available for download in pod manager would contain all the information around the startup and the shutdown of AEM service, which may be applicable for any issues in the deployment. Let’s discuss a few common reasons why the execution fails in the deployed system. There are situations where the cloud manager pipeline holds an older version of AEM than what is deployed on the target environment. This may cause the execution failure due to any known product or infrastructure issues, and these might already be fixed in the latest AEM release. The best course forward is that if the target environment has an update available option, select the update from the environment actions and then re-run the build. There could also be situations where the code running during the start-up of the newly deployed AEM service takes so long that the cloud manager times out before the deploy can be completed successfully. In these cases, the deployment may eventually succeeded, even though the cloud manager status reported failed. This may be caused by the code reversals or delay in start-up time due to the OSGi life cycle or the custom bundles being deployed. The best course there is to review the implementation for the code that runs early in the OSGi bundles life cycle and review the AEM error log for AEM author and the publish service around the time of the failure, and we can look for log messages indicating any custom logs . Although most code and configuration violations are caught earlier in the build cycle, it is possible for an incompatible configuration or some code to go undetected until it executes and the last deploy is done. If this happens, we need to review the AEM error log or the AEM author and the publish services around the time of the failure, as shown in the cloud manager UI. Best bet is to review the log for any errors thrown by the Java classes provided by the custom application. If any issues are found, resolve, push the fixed code, and rebuild the pipeline.
Help and resources. If the above troubleshooting approaches do not resolve the issues at hand, customers and partners should be able to reach out to Adobe support for guidance. This could be done by creating a support ticket via the admin console by accessing the support tab and creating a support ticket with all the details of the problem so the respective teams can help qualify and help get resolution. Quick note, if you are a member of multiple Adobe orgs, ensure the current org is selected in the list switcher prior to creating the case. Tutorials, documentation, and discussion threads related to AEM as a cloud service are available on Experience League and forums with tons of information that would help partners and customers to know more and grow with AEM as a cloud service.
Thank you for watching this video. -