Sizing the Adobe Commerce Cloud for a large number of API requests in headless implementation
Adobe Commerce Cloud is used as an eCommerce platform for 7-Eleven Australia across multiple channels. In this presentation we would share the experience of using Adobe Commerce Cloud with a high number of resource-consuming API interactions based on the example of the personalised offers using a combination of Adobe Campaign and Adobe Commerce. We’ll share experience on the estimation and sizing of the infrastructure based on the expected demand, testing approaches and interpretation of the monitoring data that is available
Transcript
Hopefully everyone enjoys the Adobe Developers Live event day two and then having a great time in participating in a number of interesting sessions. My name is Maxim Bybakov and I’m a Director of Software Engineering at Balance Chat, which is based off Melbourne, Australia. And today I would like to talk about how to size Adobe Commerce Cloud for a large number of IPI requests and headless atomic implementation. The presentation itself will be based on a summary and an example of the headless implementation which we have implemented for 7-Eleven Australia together with Adobe services. And then while the presentation is focused mainly on Adobe Commerce Cloud, however, the similar approach and patterns can be applied for on-premises implementation as well. The implementation for 7-Eleven was an implementation for Adobe Commerce to get IEM using a SIF connector and mobile application via Enterprise Service Bus. The specifics of the introduction for 7-Eleven is a promotion based on a number of personalised and product combination offers for a different type of the customers. And it’s a country-wide, state-wide, as well as a store-specific, where it might be quite a heavy utilisation by the customers and that’s been implemented in a way to increase the customer conversion. The external loyalty systems are also in place, like Adobe Campaign. Adobe Campaign has been integrated with a number of cards-related IPI calls, which are mostly non-cashable as they are session dependent and they quite rely on external systems to retrieve information about the card, about the price, stock. There are a number of heavy operations related to card operations, like adding or removing products from the card, as well as a number of calls in regards to getting the customer initialised the session, as well as all the heavy checkout operations, which are mostly around recalculation of the totals, payment methods, processing of number of fraud checks and creation of the order. Some of these IPI calls are probably quite tiny and light and from an execution time perspective they’re probably not more than 200-300 milliseconds. However, some of the calls might be quite lengthy calls, especially associated with payments or fraud assessment or maybe a calculation of a particular promotion. The longer calls are not that CPU or memory intensive, they mostly are ideal for waiting the response from a third-party applications. So the focus of our today’s talk will be mostly on estimation of the infrastructure based on the customer interactions or application interactions with Adobe Commerce and the infrastructure itself. And also we will be talking about the number of metrics and methods we could use to validate our configuration and talk about the specific settings of what may or may not affect the system and also see if we can align with non-functional requirements. There are also a number of critical success factors which do exist in the application itself. Make sure that your application is optimized and make sure that your application is developed in a specific way in order to cater application requirements. And also there is a number of caching methodologies that do exist which helps you to scale the application. These type of factors we won’t be looking at in the presentation. So then talking about what actually responsible for a concurrency execution or what could be one of the problems which could lead to a bottleneck or problems in the throughput of your application. One of the significant ones is the maximum number of PHP processes which could run and execute concurrency. Another one which is obviously a different service which your application is relying on is MySQL, Redis, Elasticsearch or a number of external services which are running in synchronous integration. However what we did found as an interesting outcome there is the PHP processes was one of the a bottleneck and that does require a quite a careful calculation as well as estimation to make sure that your IPI calls are executed within a low latency as well as they’re quite self-sufficient from a time perspective as well as returned and not affecting the application interaction between the client. So the maximum concurrency PHP process controlled by a setting called PM Max Children, that’s parameter is defined in the PHP PM config. That setting unfortunately can be managed by a developer itself but could be changed by creating a support ticket to Magento cloud team. Then the integration of Adobe Commerce cloud that number multiplies by two and by a number of vCPU which exist on front-end instances. For example if you’re running on a split infrastructure and your front-end instances do have 48 vCPU instances then on that one instance you could run 96 concurrent PHP processes at the same time. What does that mean? That if your application sends 96 calls at the same time then they will be processed at the same time. However if you send 97 calls that call would be queued and will be processed with a little bit of a delay. Then that parameter could be changed depends on your RAM and CPU utilization and that could be a very useful parameter in terms of the calculation of what should be the size of your infrastructure and as well as what could be a utilization of other services which are MySQL or Redis or something like that. It’s useful to assess that specific parameter based on your monitoring systems like Neuralik which comes out of the box from Magento cloud. Then if we look at the problem on a graph size itself we could see what happens if for example you pass the threshold and then you have more than allowed number of child processes. That graph shows that the actual… probably I just jumped one slide… yeah sorry. The PHP FPM processes does support a queuing system and the queuing systems are basically responsible to queue all the requests which are coming to PHP. The API call comes from a browser or an application to FASTI. So FASTI goes to nginx and then it goes to PHP FPM where you have enough workers which is your child workers. This is where the magic happens. If you have enough max child workers that they will be processing your request straight away. However if there are not enough they will be queued and then the setting which calls listening backlog it basically defines the number of maximum queuing connections. So why queuing is actually good? Because queuing does allow you to get up for spikes of your usual traffic and then if for example you don’t have enough capacity to process the call then it will queue that call and will execute as soon as the PHP FPM worker become available. As soon as that happens it will be executed. But why queuing is actually not so good in terms of high load applications? Presume your queue is high enough and then there are two problems. One problem is that the application itself does receive the response in a bit of a delay. In some cases it might be quite significant delay. It depends on the length of the queue. And then also the services like FASTI, nginx and even the browser of a client have the internal time limits. For example the usual time limit on nginx and FASTI is around 60 seconds. If that limit passed then the response back to your browser or to an application would be that unfortunately there is a timeout happened and we could not process your request at the moment. And then potentially your application might crash and the user might see quite unpleasant error. Let’s see how that actually affects the application. So for example on the right hand side here we used two applications to assess the impact. One was a new relic to see how the particular request is being processed and how long it takes. Another application we have been using up in size to see what the timing of a particular request and what’s the difference and what’s the caps between them. So if there is enough workers there is no queue type of scenario queued. So we see that every request executed straight away after another. And then on the relic side we see that requests are processed quite quickly. There is no spikes there. Everything is fine. Let’s have a look on the left hand side. This happens when PHP PM don’t have enough workers to process all the traffic which is coming to the infrastructure. From the relic perspective we actually see that nothing has changed. The time of processing the requests is still the same. However from the application perspective we do see that there is quite a significant gap between the calls. Like for example when they use a retrieve information about the card and then the call to update the card there is a gap of five seconds and then subsequent call there is another gap of almost 10 seconds. That actually do affect your application and your customers. However it’s not truly reflected in your relic and it’s also a showcase of how the PHP PM queuing is actually affecting our application. Then there are two methods on how you can identify that you actually have an occurrence of queuing of the requests and there are two methods on you can use which do exist on Adobe Commerce Cloud. One method is PHP PM status page which you can request to get enabled and then based on a particular page you can go there and see the stats of how your PHP PM is progressing. There are a number of metrics and settings you will see there. However there are two important settings which we need to take into account. One is listen queue. That shows how many requests currently sitting in the queue and needs to be processed by PHP PM worker. If you if you have enough PHP workers this value should be always zero. Then another valuable thing is a historical metric which called max children reached. That metric show how many times your PHP PM actually had a problem where it did not have enough workers. If you see that there is number more than zero that’s an indication that the PHP PM workers needs to be increased. Another one which is quite useful notification is a fastly logs which are available with new relic logs. There is a value called time elapsed. That value show your long executed requests. That’s how you can try to find if your system does have this problem and then potentially you will also find which request is most longest or been involved in the queuing. Then PHP PM could be configured in a few ways. One of the way could be that it could be dynamic or on demand or static. In this case we’ll be talking about the dynamic and static. By default the infrastructure is configured in dynamic way and then this way is most popular one because that gives you quite a balanced configuration if you have any other systems running on the same instance. For example you have a database running or you have some other applications which runs on the same instance and it helps you to save the CPU and memory. But what you may see time to time in the logs that your PHP PM process manager requires to scale up because it don’t have enough running instances or it’s already reached a number of instances. Another one is PM static. The PM static gives you an ability to configure your performance on the best performance and then that means you will have a number of workers which are sitting there and ready to consume the traffic in a warmed up state. Let’s see what’s the difference between dynamic and static in a large scale of requests. So for example we do have a configuration of dynamic and it ready to consume a thousand requests and then we have static which also ready to consume a thousand requests. However one of the important parts here is how many servers are running at the same time. If you have a process manager configured to run minimum one server or start server is two that means your PHP PM server will be sitting there in a kind of non-warm state in a cold state. So if traffic comes it will require some time to scale up and it may cause quite a bit of a latency. If you have look on the graph here we see that if the traffic comes with about thousand calls at the same time then the study configuration is able to handle traffic in quite a time fashion manner. However the dynamic does require quite a bit of a time to scale up and in some cases you may see that the latency might be a second or even more. Then there is a number of simulation models do exist on how you can probably calculate your queue and calculate your infrastructure size. I would do suggest you have a little bit of a study on the queuing theory. In the theory do consider probably two major options there. One is theoretical which scenario where all users generating the highest load and do come on the website at the same time. And opposite which is optimistic method where we have a continuous uniform probability where distributions for each of the type of the requests uniform in execution time. The assessment of such a basic scenario can assist on understanding of the theoretical minimum of the number of PHP processes which do require to process all the traffic with queuing or without queuing. Let’s have a look on how to probably confirm the size for your infrastructure in a load test scenario. So in a load test scenario we do have a number of factors which simulate the way and how the customer interacts with the application. So for example we have an actor which is a guest non-capability buyer. It comes to a home page, stays there for a random period of time and then moves to a carrier page, stays a number of periods of time and moves to another subsequent pages. And that basically indicates how long a particular user could spend on a website and a time associated with a particular step. Then if we have a look on example on how to calculate how many processes or executors or workers do you need for that scenario. So for example there are three types of the API calls. The one API type is where your API call consumes less than 500 milliseconds. Another type is less than one second and the third type is less than three seconds. For a calculation perspective we did use that number 250 milliseconds for first type, one second and three seconds for subsequent types. With a duration of the session for 180 seconds and then we put some number of calls which do require for that session on a particular type. Taking a population of 5,000 calls we’re getting numbers where for 5,000 users which do have a session of 180 seconds and they do execute that number of calls per second. It comes to a total execution time in milliseconds of 92,500,000 milliseconds. In order to process all of this in one time without a delay to any of the call the minimum number of the workers would be 514 workers. If we want to be really on a safe side we probably take a coefficient and we will multiply this by two and that’s we will have a number of PHP workers which give us an ability to process all the requests. And then as a result we will get a number of vcp instances which do require for processing all the requests. So that number which you need to take in an account where you size in your Adobe Commerce Cloud infrastructure and that’s the vcp instances which do need to have there in total space across all the nodes. Then the simulation of the load could be run through a number of different tools and there is a randomization needs to be put in place and you need to make sure that you’re running a load generator which completely separate from each other and they don’t have a network or a power dependencies. And also have a set of tools which do help you to analyze the metrics is New Relic IBM, New Relic logs or the load testing software which does extra monitoring tools. Another thing which should be considered as an improvement to a large scale IPI implementation that utilization of the IPI platform architecture which had been presented by Nishan and Igor earlier today. That architecture gives an ability to process a number of IPI calls at the same time and would be highly scalable. There are also a number of examples services which have already been implemented and run on that architecture is product recommendation or Adobe Live Search is a good example of successful implementation which do remove the requirements of a merchant’s infrastructure. So just to summarize on what we just talked about there is a number of key things to consider for maximizing the throughput is make sure that your application is optimized, make sure that you have a sufficient caching strategy across all the levels, you configure your PHP, FBM configuration based on your application use case, you are ready for normal traffic or a spike of the traffic and you take the queuing into account and then you have a number of tools to rely on which is New Relic IBM and logs and infrastructure metrics that are best friends in all the journey and of course we are all looking forward for Adobe Commerce auto scaling infrastructure being released soon. Hopefully everyone was able to follow in a compressed time however if you have any questions you want to clarify or discuss anything further feel free to reach me out on Twitter or I can answer a number of questions right now if you have time. Have a great day everyone and enjoy Adobe Developers Live and stay safe.
Additional Resources
recommendation-more-help
3c5a5de1-aef4-4536-8764-ec20371a5186