A process may fail to start automatically after a maintenance activity.
The following alert may be noticed:
Event : Process <Process Name> of service <Service Name> has failed to restart after multiple attempts
Example:
Process isi_healthcheck of service isi_healthcheck has failed to restart after multiple attempts
Usually, the process restarts automatically.
In case a process does not restart automatically, manually restarting the process typically resolves the issue.
Verify if the process is running or not:
- Open an SSH connection on any node in the cluster and log on using the "root" account
- Check if the process is running now on the Node on which you see the error message
# isi_for_array -n <Node Numeber> 'ps auxwww | grep -i <Process name> | grep -v grep'
Where <Node Number> is the Node on which the process is not running.
Example:
# isi_for_array -n1 'ps auxwww | grep isi_healthcheck | grep -v grep'
If the process IS Running:
- Get the group id using the following command
# isi event groups list
- Clear the alert
# isi event groups modify --resolved=yes --ignore=yes --id=<Group ID from Step #1>
If the process is NOT Running:
- Please contact Isilon Support to manually restart the process for further troubleshooting.
Warning: Serious problems might occur if you incorrectly restart the services on a wrong node.
A video demonstrating the steps documented here can be found here:
Isilon: How to Check if a Process is Running on a Node After a Maintenance or Node Offline Event