Server Failure

Over the weekend I had one of my active servers go into a strange state and take all its virtual machines down with it.  Well no big deal thats why i use Xen and DRBD.  If the active machine dies all data is current up till the failure and all virtual machines can start on the backup.  The problem though is I don’t have any type of fencing.  Fencing isolates a server so that it never tries to access any type of shared disk or claim virtual IP’s that it should not use server bad data etc.  This is needed when a failure is not clean.  Well this failure was far from clean.  ssh was dead to everything (other than the backup) mail would make connections but not show a banner, web was dead but pingable, and database was up just fine.  

Well this means I would not safely bring up the secondary as active or risk corrupting data and all the other badness related above.  In this situation I chose to leave services down until physical access was available to safely fence the server. 

I think now I might do a RFQ for ether a new switch that I can down ports or a power system with switchable power so i can STONITH.

 I think I made the right choice.  Most my customers are dairy based and they had proofs this last week which is a big time of the year, it’s like earning reports for wall street, big deal.  Well those proofs were all tied up in a dead server.

Everything is backup now, you can read the ‘official’ notice at www.mlds-networks.com
 

NONE, NADA, ZIP, ZILCH

Why don't you pony up and be the first to add your comment?

Add your own comment...

plants