News: Downtime on 14/01/2024 - Cause and Our Follow up

Published: 14/01/2024

Dear JimatHosting user,


on 14-01-2024


 


10.30AM - 


our vendor received an update request from Cloudlinux (our security provider), requesting an update to patch certain vulnerability. This also involved a kernel update which require a reboot at a later time.


Our vendor requested our permission first, to which we gave a green light.


 


11.00 am


As this is happening to all 15 server, 14 of them is running the update normally. Reboot is also happening normally.


 


12.00noon


On our last server, unexpected issue crops up upon rebooting, in which a hard disk is not detected upon reboot.


As our attempt to connect to our IPMI is not working, we have dispatched our engineer to the Data Center.


 


12.30pm


Upon arriving, our engineer noticed that the server is not up.


They have shut down the server for 5 minutes. And attempt to start it again.


The usual slot-out-slot-in is also executed to make sure hard disk connection is working correctly.


All the method above is not working.


 


2.00pm


Our enginner now have to restore that particular hard disk. And restore the grub.


 


3.00pm


Hard disk restoration is completed, however the setting for all the cpanel has been resetted, so our user can enter cPanel, however no website is up.


This prompt further check.


 


4.00pm


Our engineer noticed that the /home folder for all our user is jumbled up, some folder which is supposed to be in /home5 ended up in /home6


This means that our fstab is not mounting-mapping it correctly.


 


5.30pm


After all the fix, the server is now up.


 


 


Our Follow Up Solution


We believed this is the second time this issue happens with the same server within 1 month, which is not acceptable. We are in the middle of purchasing a new server and will be migrating all our client to the new server within 3-4 weeks.


Meanwhile, we will monitor the server closely.