Handling a high-load project that Magento Cloud couldn’t

Unexpected things are bound to happen after go-lives. Recently we encountered an rpm spike right after launch that even Magento Cloud engineers weren’t able to fix. See below the description of the issue and various solutions that led to remarkable web transaction time reduction.

Background

The client is a Canadian store with 34k products, 1 website, 2 store views, and a multi-warehouse. After go-live, one specific thing came up no one told us about. We are used to 63-75 rpm average (requests per minute) to 279, to even 5320 rpm average as the biggest project so far.

For this client, it hit 5720 rpm average (spikes to 11.1k rpm, 2500 – 3xxx online users during business hours without any promo campaigns). And this happened way past our business hours no less!

There was no “snap of a finger” solution, and we had to cut off default Magento functions. Developers observed and iterated. Some actions did not give any results, some made things worse and resulted in load drops, but they all gave information on what to do next.

Timeline of events

  1. The site becomes slow or not available, Magento increases infra to the maximum — didn’t help.
  2. ElasticSuite + ElasticSearch combo is considered a faulty one by Magento since it’s a 3rd party module. 
  3. ElasticSuite is disabled by the other team working on this based on Magento recommendation, and ElasticSearch enabled per configs given by Magento. 
  4. Scandiweb’s recovery team joins. 
  5. As appeared, the given configs were not complete, and the other team caused fallback to MySQL search. To update env.php configs, one should be using Variables inside of Magento Cloud.  
  6. Yet despite our expectation, switching to the ElasticSearch engine didn’t help and even increased load by a few %. 
  7. In parallel, we are inspecting traffic to cut off ideas of DDoS and bots. 
  8. Also, we are looking into other fallback options with different search engines.
  9. The other team + the client is blaming modules (GTM, related products, etc.) and randomly switching them off. 
  10. With the latest stats, we have regrouped to prepared patches for generic errors, search plugin updates, removing catalog_product_entity, and cataloginventory_stock_item requests from catalog and search pages, adding a limit to search results, switching off Magento synonyms, etc., etc.
  11. Enabling the previously disabled modules one by one. 
  12. Enabling ElasticSuite full functionality.

For now, the site is at the fastest speed since its launch. The client gave positive feedback on ElasticSuite and search term performance.

In conclusion, it seems that the problem was with the rpm spike (with a break-point at around 7000 rpm after which responses weren’t going up proportionally), and Magento Cloud’s ability to handle it. This led to the need to look for even further improvements within Magento. 


Is your Magento 2 store slow? Do you need assistance in optimizing your website’s speed? Let us help you! Reach out to info@scandiweb.com and we’ll come up with a fix!


If you enjoyed this post, you may also like