Galera’s big gotcha for MySQL users

Introduction - What the heck is Galera?

Galera is a synchronous multi-master cluster database plugin. Galera allows you to set up a cluster of MySQL servers, all of which can be written to without concerns about consistency across nodes - even in the event of crashes. To be specific, when I talk about Galera in this post, what I'm actually referring to is a combination of three components:

A MySQL server (in my case I'm using MariaDB, a MySQL fork)
Codership's patch implementing the wsrep API on MySQL
Galera wsrep provider/replication plugin

I'm using MariaDB because the MariaDB developers have truly embraced Galera, making it very easy to get started using it. The MariaDB 10.0.x branch offers the option of a special package named "MariaDB Galera Cluster", which includes all three components. As of MariaDB 10.1 all three components are included by default (though the Galera features are not enabled in the default configuration). So getting started is as simple as installing the MariaDB package and doing a little configuration.

Why use it?

So what are the benefits of Galera over running a plain old MySQL instance?

Running a multi-master cluster means both read and write loads can be spread across multiple instances
The cluster can be scaled up for reads by adding additional nodes to deal with increased load (writes don't scale in the same way though, as they still need to be applied to every node)
Multiple nodes add a level of redundancy, meaning your database is more resilient to problems such as hardware failures

Galera isn't without its drawbacks though. The main drawback is obviously the increased cost of resources for running a cluster, compared to those required to run and maintain a single instance. However, these can be reduced by making use of cost-effective cloud hosting platforms (this approach also has the added bonus of enabling you to easily spin up new nodes for your cluster when the workload increases).

Another potential drawback is the increased likelihood of transaction deadlocks, although these additional deadlocks are not necessarily indicative of a major problem — they are a result of Galera's transaction strategy. This will become clearer as I get into the way Galera works under the hood.

However, while we knew we should expect more deadlocks, the biggest stumbling block our team came across when we began using Galera was related to Galera's transaction strategy — specifically, why (and when) deadlocks can happen. It had us scratching our heads for a while and it's what I want to focus on in this post.

The problem

After a little while testing Galera out we began to notice something odd. Certain transactions in our application weren't being completed correctly. Moreover, it wasn't so much that they weren't being completed correctly — sometimes the first few changes made in a transaction would be missing, while the changes later on had been applied!

Although the problem only seemed to happen intermittently, it was a pretty serious issue, so I decided to focus on getting to the bottom of it. After much brow-furrowing, many cups of tea and extensive poring over documentation, I learned that our problem was caused by assumptions our code was making about the way MySQL behaves in transactions.

Glass half full, glass half empty

To understand the cause of the problem you have to understand the difference between the way Galera deals with transactions, locking and conflicts, and the way they're dealt with by plain old MySQL.

When a transaction requires a lock on a certain row, a plain old MySQL server employs what is known as pessimistic locking. This means that the server will assume that, unless steps are taken to avoid it, some other client will surely also try to modify the row in question and cause a conflict. As such, if a lock is required by a transaction, MySQL prevents any other client from messing with that row until such time as the transaction is committed (or rolled back) and the lock is released. At that point the second transaction can obtain the lock, make its changes, and so on.

Galera transactions also work like this but only on the local node. That's the important thing. When it comes to other nodes in the cluster, Galera uses optimistic locking. This means that no checks are made with other nodes on the cluster when local locks are attained. Galera assumes that, once the transaction is committed and the other nodes are informed, everything will be okay and no conflicts will arise.

A quick description of how commits work in Galera

Obviously that's not always going to be true, and Galera deals with this possibility by employing a combination of a cluster-wide transaction order sequence (so that all nodes agree on the order transactions are applied in) and write-set certifications.

Once the client asks for its transaction to be committed by sending COMMIT;, Galera springs into action. It compiles all the changes made by the transaction into a write-set. This write-set is then replicated to all the other nodes in the cluster, and together they agree on where this transaction goes in the transaction order sequence. Each individual node can then "certify" the write-set. This certification determines whether or not the write-set conflicts with any other transactions. Because the certification process is deterministic (as every node agrees on the transaction order sequence mentioned above), the nodes don't need to talk to each other in order to agree on the outcome of certification — indeed the certification process happens asynchronously on each node. As soon as the client's local node has certified the write-set, the client receives a positive result for the commit.

As I understand it there are two ways conflicts can arise between transactions during this certification stage. The first is if another replicated transaction, placed earlier in the incoming queue (and therefore already applied), presents locking conflicts. The other is if a local, as yet uncommitted transaction presents locking conflicts.

In the first instance, replicated transactions that appear later in the queue will be rejected by all nodes (remember — all the nodes will reach the same conclusion, since the process is deterministic). When that happens, the transaction's local node fails to certify the write-set and informs the client of a deadlock and the client's transaction is rolled back. This is known as a local certification failure.

The second instance is a little different. When attempting to reconcile conflicts between a local, uncommitted transaction and an incoming write-set, the incoming write-set will always win. The currently uncommitted transaction on the local node is rolled back and the client is informed of the rollback at the next available opportunity. This is known as a brute-force abort.

Back to the problem at hand

So what does this mean for our application? In a nutshell, it means that the application must be prepared to experience a deadlock at any time. Let's look at some example scenarios.

Example #1: An earlier transaction conflicts with our own

Our transaction changes move ahead without any problems and now we're ready to commit our work:

> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

> UPDATE users SET active = 0 WHERE id = 5446;
Query OK, 1 row affected (0.00 sec)

> DELETE FROM subscriptions WHERE user_id = 5446;
Query OK, 1 row affected (0.00 sec)

So now we attempt to commit. Unfortunately enough for us, another transaction with an earlier transaction order sequence number has been committed by another node on the cluster and it also modifies row 5446 in the users table. This means the certification of our write-set fails and we are rolled back.

> COMMIT;
Deadlock found when trying to get lock; try restarting transaction

It's important to realise that this would never happen on plain old MySQL. On plain old MySQL, once you reach the point of sending COMMIT; you already have all the locks required to ensure there are no conflicts with your changes. This is not the case with Galera.

Example #2: An incoming write-set trumps our uncommitted transaction

We are part of the way through our transaction. We've obtained a lock on a row we're modifying and are about to move onto the next stage:

> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

> UPDATE orders SET status = 3 WHERE id = 123;
Query OK, 1 row affected (0.00 sec)

At this point, unbeknown to us, another incoming transaction that modified row 123 in the orders table has passed certification. Don't forget — the incoming transaction (i.e. a transaction that would precede ours in the transaction order sequence) always wins. Accordingly, at the next available opportunity, our client will inform us that we've hit a deadlock.

> SELECT status FROM shipments WHERE order_id = 123;
Deadlock found when trying to get lock; try restarting transaction

As you can see, in this case the "next available opportunity" was as a response to a SELECT statement. Again, this would never happen in plain-old MySQL as no new locks are required for a plain SELECT statement. Nevertheless, being prepared for this sort of outcome is de rigueur when working with Galera and also applies to pretty much any other statement you wouldn't expect to have issues with in plain old MySQL (e.g. SAVEPOINT sp1;, RELEASE sp1;, etc).

So what was our problem?

As you may have guessed, the problems my team experienced were as a result of not correctly handling all the possible deadlock responses from Galera.

Earlier in this article I described a situation we noticed whereby the initial portion of a transaction appeared not to have been applied, but subsequent parts were. Here's an example of how that could happen:

> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

> UPDATE orders SET status = 3 WHERE id = 123;
Query OK, 1 row affected (0.00 sec)

> SELECT status FROM shipments WHERE order_id = 123;
Deadlock found when trying to get lock; try restarting transaction

> UPDATE shipments SET status = 3 WHERE order_id = 123;
Query OK, 1 row affected (0.00 sec)

> COMMIT;
Query OK, 0 rows affected (0.00 sec)

As you can see, our transaction failed halfway through, but as our code was making assumptions about when it could expect to receive a deadlock it was ignoring this failure and carrying on as if it were still in a transaction. As such, the changes to the orders table were never applied, while the subsequent changes to the shipments table were applied immediately as they were no longer part of an explicit transaction.

Final Thoughts

While it took some serious investigation to get to the bottom of our problem, I do think it was worth it. Once you get your head around Galera's idiosyncrasies it's really not a big problem to deal with them and, so far, we've found that Galera's benefits have easily outweighed the initial pain we experienced.

In my next article, I'll introduce Galvanize, a class which demonstrates exactly how one can safely and consistently use PHP to handle the deadlocks caused by Galera.

References:

What is MariaDB Galera Cluster?
MariaDB Documentation
Understanding Multi-node writing conflict metrics in Percona XtraDB Cluster and Galera
Jay Janssen / Percona, November 2012
Certification-Based Replication
Codership, February 2015
Dealing With Multi-Master Conflicts
Codership, October 2014
Special thanks to Jay Janssen and Karthik P.R. for their contributions

There are 2 comments:

Edward Hibbert commented:

Monday, 9th October, 2017 at 5:04pm

Thanks for this clear explanation. I had also failed to appreciate that we could get a deadlock error from a SELECT, abandoning the transaction so far, and was merrily continuing an operation resulting in us doing half the actions in a transaction. Ouch. This article confirmed was I was starting to suspect.
Hugh Nolan commented:

Monday, 13th December, 2021 at 9:44am

Thanks very much for this very clear explanation. Luckily we found this article before we started cutting code so you have probably saved us many hours/days of head scratching.