*** This will often occur if a node crashes while processing a query that has not yet been replicated to the other nodes.

Pre-Flight checks

  1. Make a backup on each node
    • mysqldump –all-databases > backup.sql
  2. Comment out the broken node’s IP in the haproxy configuration file and restart the service (if using haproxy for load balancing)

Attempt 1 – Correct the grastate.dat file

  1. Verify or paste the UUID for the Galera cluster into the /var/lib/mysql/grastate.dat file on the problem node. You can get this UUID from other nodes by running:
    • show global status like ‘wsrep_%’;
  2. Restart MySQL
    • service mysql restart
  3. If this doesn’t work you can also try moving the grastate.dat file to /tmp and restarting the cluster
    • mv /var/lib/mysql/grastate.dat /tmp/
    • service mysql restart

Attempt 2 – Remove ib_logfiles

  1. Move the ib_logfiles to /tmp
    • mv /var/lib/mysql/ib_logfile* /tmp/
  2. Try to restart MySQL
    • service mysql restart

Attempt 3 – Hail Mary

  1. Move the mysql directory to mysql.old
    • mv /var/lib/mysql /var/lib/mysql.old
  2. Restart MySQL
    • service mysql restart
      • The service will likely fail, but if you log into the other nodes you will see that the node has joined. You will also see /var/lib/mysql growing in terms of size.
      • Once the size of that directory matches the others you will be able to start MySQL successfully again.

Leave a Reply

Your email address will not be published. Required fields are marked *