*** The Galera software is designed so that all nodes are only as fast as their slowest node
Step 1 – Detect the node slowing down the cluster
- Check the amount of pause events a node has sent due to flow control on each node (lower is better)
- SHOW STATUS LIKE ‘wsrep_flow_control_sent’;
- Check the average of the received queue length since the last status query on each node (lower is better)
- SHOW STATUS LIKE ‘wsrep_local_recv_queue_avg’;
- Nodes that return values much higher than 0.0 indicates that it cannot apply write-sets as fast as they are received and can generate replication throttling for the whole cluster.
Step 2 – Fix the node slowing down the cluster
- To immediately speed up the performance, comment out the slow node from the haproxy configuration (if using ha as the load balancer)
- Find what is different about this node and solve accordingly.
- Disk IO