Solve Phantom Read in MySQL

An answer for write skew when “creating” information

Photograph by Tobias Fischer on Unsplash

The mix of MySQL and its storage engine InnoDB is sort of essentially the most broadly used relational database these days, and Repeatable Learn is the commonest within the isolation stage.

Nevertheless, in comparison with PostgreSQL, InnoDB has a number of issues that can not be solved elegantly on the Repeatable Learn stage.

  1. Misplaced updates
  2. Phantom learn

Misplaced updates in PostgreSQL could be fully solved with out further hacks. As for phantom reads, there are some small methods that can be utilized, reminiscent of vary varieties and different mechanisms.

Nonetheless, MySQL nonetheless needs to be cautious to determine the pitfalls and cope with them correctly by builders to resolve such issues. In my previous article, we launched 3 ways to deal with misplaced updates. These approaches present a extra versatile answer to misplaced updates and are appropriate for a wide range of situations.

On this article, we are going to additional discover the way to correctly clear up the write skew attributable to phantom reads.

There are lots of forms of scenes that end in phantom reads, however usually, all of them have the next sample.

  1. Search a selected vary.
  2. Do one thing in accordance with the outcomes of the vary (Create, Replace, Delete).
  3. The operation will immediately have an effect on the unique vary outcomes.

Suppose it’s only an replace or a delete, essentially the most simple technique to keep away from write skew is to make use of an unique lock. In the event you use FOR UPDATE firstly of SELECT, then two concurrent transactions shall be pressured to go one after the opposite, thus successfully avoiding the write skew within the race situation.

Nevertheless, within the case of a create, the answer is just not so intuitive. As a result of there isn’t any corresponding row to lock in SELECT, the row is created later. So the way to clear up it?

Earlier than introducing the answer, let’s use a sensible instance to explain the issue attributable to phantom learn.

There’s a assembly room system that gives customers to order a gathering room, and when the person has efficiently reserved the assembly room, a brand new corresponding information shall be added within the desk as follows.

reserving

The above desk information that person A reserved the assembly room 123 for one hour on 5/1 at 10 am.

The conduct of this method shall be just like the next pseudo-code.

depend = `SELECT COUNT(*) FROM reserving 
WHERE room_id = 123 AND
start_time < '2022-05-01 11:00' AND
end_time > '2022-05-01 10:00'`

if depend == 0:
`INSERT INTO reserving (person, room_id, start_time, end_time)
VALUES ('A', 123, '2022-05-01 10:00', '2022-05-01 11:00')`

When the person is bound that the assembly room is unoccupied for the corresponding time slot, then the person can insert an entry as a reservation and the following person won’t have a time battle. Doesn’t that appear good?

The issue happens when two customers need to occupy the identical time slot in the identical assembly room concurrently, and so they can each go the primary SELECT validation, to allow them to each insert a reservation, and a battle happens. And such a scenario cannot be solved by including a lock, as a result of there isn’t any row to lock firstly.

Since there isn’t any technique to flip a simultaneous operation right into a sequential operation by means of a lock, we let one among them merely fail. To take action we have to add some constraints, e.g. distinctive constraints, to the desk.

One method is to create a novel constraint index on the room_id, start_time columns, in order that the second particular person making an attempt to order the identical time slot will fail.

The issue is solved if we prohibit using every room to a most of 1 hour.

But when the assembly room could be booked for greater than an hour, one other drawback arises.

  1. Consumer A is reserved for five/1 from 10am to 12pm
  2. Consumer B is reserved for five/1 from 11:00 to 12:00

When each Consumer A and B are working on the similar time, this distinctive constraint clearly can’t be efficient, after which the battle across the assembly room stays.

To resolve such phantom reads, the developer should use some methods to disclose conflicts hidden underneath the identical desk.

A method is to create a brand new desk and pre-fill it with information to behave as a coordinator for simultaneous operations. Within the case of this assembly room system, we will create a desk time_slots that lists all time slots upfront as follows.

time_slots

When the assembly room is to be reserved, we not solely execute SELECT on the unique reserving, but additionally SELECT on time_slots, and we will add FOR UPDATE as a result of the information already exists. It’s price noting that the brand new SELECT FOR UPDATE is executed earlier than the unique SELECT.

In that case, when the anticipated time slots of two simultaneous customers overlap, they are going to be blocked by the unique lock and develop into one after the opposite, and the latter will fail immediately as a result of it sees the results of the earlier completion.

I’ve to say such an answer is tough and never intuitive. Nevertheless, so as to not sacrifice efficiency when utilizing MySQL, the isolation stage is just not configured to be Serializable, which implies complexity have to be traded off for efficiency throughout execution.

It’s a trade-off between complexity and efficiency. Actually, utilizing FOR UPDATE to course of synchronization in such a state of affairs does have an effect on efficiency, and if reserving is a desk that will have phantom reads in all contexts, then making reserving individually Serializable is a possible answer.

When utilizing a database, we should know the capabilities of the database and perceive all of the unsolvable conditions of the database, in order that we will know what sort of behaviors are potential dangers when designing and growing.

As well as, the way to correctly handle the dangers can be an essential matter. Though the use instances aren’t precisely the identical for everybody, the patterns are comparable, and studying the way to clear up every sample will show you how to to cope with comparable conditions rapidly sooner or later.

This text supplies an answer for write skew when “creating” information, whereas the earlier article is about fixing write skew when “updating” information. These ought to cowl a lot of the conditions that you just would possibly encounter. If anybody has encountered other forms of MySQL race circumstances, please be happy to debate them with me as effectively.

More Posts