Elasticsearch: No alive nodes found in your cluster (fails repeatedly, breaks the inventory index, all products disappear, customers can't place orders!)

I know this isn’t a help forum for ElasticSearch, but as there is already a recent (locked) post about ElasticSearch failing, I’m posting in case this become a more common question and if we can find a solution then it will help others out.


Summary of Issue

I’m running Magento 2.4.1 and ElasticSearch crashes randomly. When it crashes
(1) the Inventory Index and Catalog Search Indexes get broken
(2) on the frontend multiple product category pages show “We can’t find products matching the selection” even though there should be products in there
(3) customers can’t complete their purchases, end up retrying multiple times leaving unpaid orders on the system
(4) search box shows no results


Restarting ElasticSearch

It looks like ElasticSearch is the cause. To reset it and get everything working, I do this (renamed usernames to match Craig’s installation instructions):

(logged in as user magento)
su craig
sudo systemctl restart elasticsearch (takes a couple of minutes)
exit (user Craig)
(as user magento)
php bin/magento indexer:status
php bin/magento indexer:reset inventory
php bin/magento indexer:reindex

Then everything works again for a random amount of time anywhere from 5 hours to 7 days, then it falls over again.


Magento system.log

Have narrowed it down as ElasticSearch is the problem, or a problem kills ElasticSearch!
main.CRITICAL: No alive nodes found in your cluster [] []


Check of ElasticSearch logs

su Craig
sudo nano journalctl -u elasticsearch
gives…
systemd[1]: elasticsearch.service: Main process exited, code=killed, status=9/KILL
systemd[1]: elasticsearch.service: Failed with result ‘signal’.

sudo nano /var/log/elasticsearch/elasticsearch.log
open the log file, but there are no entries in there since the date of ElasticSearch installation (2 months ago)


Server specifications

Ionos Dedicated Cloud Server XL
4 vCore
8GB RAM
160GB SSD
Setup exactly as per Craig’s amazing Installing Magento 2.4 from scracth video (I’ve installed 2.4.1, but everything else is the same)
php.ini memory_limit set at 8GB
Magneto root .htaccess memory_limit is set at 4GB


Tried alternative Search Engine

Is ElasticSearch used for more than just the Search Box?

I installed Algolia Search which replaces ElasticSearch and runs it’s own search from their servers (we used this very sucesfully in the M1.9 store, and were going to do this anyway). Of course, I was hoping that this would mean no more ElasticSearch failures, but it’s just happened again today.


Questions!!!

  1. What would cause the errors
    systemd[1]: elasticsearch.service: Main process exited, code=killed, status=9/KILL
    systemd[1]: elasticsearch.service: Failed with result ‘signal’.
  2. How do I diagnose further
  3. Can I set it up to restart automatically after failing (and ideally reindex everything and also email me to let me know!)
  4. Should I just give up, go back to Magneto 1.9 and bury my head in the sand?

Looking forward to some suggestions and discussion on this. My concern is the MSI module in 2.4.1 is fragile and will lead to more people experiencing the same issue.