Multiple users reading and writing to same table - advice

enveetee · January 6, 2015

Hi

I come from a desktop (vb.net) background and have used oridinary text files for my databases. I use file locking to prevent other processes from writing to the same file simultaneously.

Now I am moving over the PHP/MySQL - what precautions should I take to stop a simultaneous processes from writing to a table at the same time.?

Do I need to lock the table before writing to it?

Does MySQL deal with this automatically and just block my process until the table becomes free?

(I am talking about a single table here, not multiple dependant tables - I know that is another issue)

Thanks

Nigel

requinix · January 6, 2015

Make sure you're using the InnoDB engine on that table.

Use locking in a transaction and store a timestamp or version identifier in the table so you know when a row has changed.

1. Begin transaction.

2. Get the timestamp/version and compare to the value you expect it to be. If changed then fail and have the user deal with it.

3. Update the data.

4. Commit the transaction.

START TRANSACTION

/* get the latest modification time to compare with the value you already have (from a previous read) */
SELECT timestamp FROM table WHERE id = whatever FOR UPDATE

/* if there are no changes then save your data */
UPDATE table SET data = value... WHERE id = whatever
/* (assuming timestamp has an ON UPDATE CURRENT_TIMESTAMP) */

COMMIT

Or in case of failure,

START TRANSACTION

/* get the latest modification time to compare with the value you already have (from a previous read) */
SELECT timestamp FROM table WHERE id = whatever FOR UPDATE

/* timestamp does not match. rollback, then deal with the conflict however you want */
ROLLBACK

(you could actually COMMIT for the failure case, since you haven't made any changes, but might as well do the correct behavior and ROLLBACK)

kicken · January 6, 2015

It depends on what exactly you're wanting to handle.

Mysql handles the database file locking for you automatically so you don't need to worry about two people doing simultaneous inserts/updates/etc.

If you want to do something like prevent someone from changing a record while another person has it open in your admin interface, that you'll have to handle on your own through some means.

Jacques1 · January 7, 2015

@requinix: This makes no sense (besides the fact that it's pure speculation). It seems you want to implement optimistic locking, but then you use SELECT ... FOR UPDATE, which is pessimistic locking. You can't have both. The timestamp is also a rather poor version identifier. Who says there will never be two queries within one second?

@enveetee: As kicken already said, this depends entirely on the concrete scenario. There is no one-size-fits-all solution, just different strategies for different requirements.

You do not have to worry about low-level data corruption. An INSERT or UPDATE query itself is atomic, so you won't end up with something like “half-of-the-data” or “two-datasets-mixed-into-each-other”. However, you do need to worry about multiple queries which depend on each other. For example, a typical mistake people make is this: They want to enforce unique usernames, so they first check if the name already exists, and if it doesn't, they insert a new row. But what if a simultaneous process has inserted the same name after the check? Then the name is used twice. Solving this doesn't require locking, though. You add a UNIQUE constraint to the name column, try to insert the name, and if this fails due to a constraint violation, you know that the name is already taken.

Can you give us a concrete example of what you're trying to do and why you think this might require locking?

requinix · January 7, 2015

It seems you want to implement optimistic locking, but then you use SELECT ... FOR UPDATE, which is pessimistic locking. You can't have both.

I don't see why not. Two race conditions, two locks:

1. SELECT/UPDATE race condition solved with a pessimistic lock. I could use an optimistic lock and an UPDATE...WHERE timestamp=X but I'd rather not.

2. "User A loaded page, user B loaded and saved, user A saved" race condition solved with an optimistic lock. Can't use a pessimistic lock between page loads.

The timestamp is also a rather poor version identifier. Who says there will never be two queries within one second?

Fair enough. Version number then, and version++ on save. [edit] Or avoid most of this discussion and use proper versioning with history. [/edit]

Jacques1 · January 7, 2015

There is no race condition if you do the version check within the UPDATE query. You can and should do that.

The canonical implementation of optimistic locking is this:

When the page is loaded, SELECT version FROM entries WHERE id = <some_id>, save in $current_version
When the content should be saved, UPDATE entries SET data = <new_data>, version = version + 1 WHERE id = <some_id> AND version = <current_version>
If the UPDATE hasn't affected any rows, then the data was changed in between (either the version was incremented, or the row was deleted)

enveetee · January 7, 2015

@requinix: This makes no sense (besides the fact that it's pure speculation). It seems you want to implement optimistic locking, but then you use SELECT ... FOR UPDATE, which is pessimistic locking. You can't have both. The timestamp is also a rather poor version identifier. Who says there will never be two queries within one second?

@enveetee: As kicken already said, this depends entirely on the concrete scenario. There is no one-size-fits-all solution, just different strategies for different requirements.

You do not have to worry about low-level data corruption. An INSERT or UPDATE query itself is atomic, so you won't end up with something like “half-of-the-data” or “two-datasets-mixed-into-each-other”. However, you do need to worry about multiple queries which depend on each other. For example, a typical mistake people make is this: They want to enforce unique usernames, so they first check if the name already exists, and if it doesn't, they insert a new row. But what if a simultaneous process has inserted the same name after the check? Then the name is used twice. Solving this doesn't require locking, though. You add a UNIQUE constraint to the name column, try to insert the name, and if this fails due to a constraint violation, you know that the name is already taken.

Can you give us a concrete example of what you're trying to do and why you think this might require locking?

@Jacques1 - thanks for the reply. Currently my app is a desktop application written in a number of BASIC languages ranging from BBC Basic up to VB.NET, it's very old legacy code in it.

My tables are flat text files modified by random access (not important). To prevent concurrent write access, I open the file for WRITING and lock it so no other process has access - do my INSERT or REPLACE the close the file. This is all done quickly and as efficiently as possible, no file is ever left open thus locking the entire system. If a file is already open, the next process waits, for 500ms or so for the current process to complete, then it tries again. Works well for users up to 10 or so

As I am new to PHP/MYSQL, I am currently bringing my desktop mindset to this new development world, thus the question: Connect-Update/Replace-Disconnect - do I have to worry about locking.

Your answer about atomic seems to help me a great deal.

I know transactions across mulitple tables is a different matter but I am not there yet!

Thanks

Nigel

Jacques1 · January 7, 2015

When you do a single query with no external dependencies, you don't need a lock. For example, you can easily insert dozens of new rows at the same time. MySQL will take care of this. So, no, you don't need low-level write locks like you needed them for plaintext files.

But be aware that conflicts may also occur at a higher level. Often times the data you want to insert or use for an update actually depends on a previous query (even if it's the same table). For example, you may first load the data into PHP, process it there and then do an update. In this case, you do have to worry about conflicts, because MySQL doesn't see the full picture. It just sees two separate queries, not the complex read-process-update procedure behind them.

A few common pitfalls you need to watch out for:

Uniqueness checks within PHP rarely work out (see my user registration example).
Counters within PHP are problematic (e. g. log-in limits).
If users can edit larger pieces of data (like a blog post), you'll quickly run into conflicts, even if it's just one user! Imagine the following scenario: The user opens the data in multiple browser tabs and edits it. If the user saves one tab and then the other, the first update will be lost entirely, because the second update is based on the previous state of the data.
Storing session data in the database is problematic, especially when you access it with Ajax. Even professional PHP frameworks struggle with this, and the homegrown session handlers you'll find on the Internet are almost always broken.

enveetee · January 8, 2015

@Jacques1 - thanks for that, it has put my mind at rest but has now got me thinking...

How do you stop a user from changing a record which is already in use/being edited?

In my desktop application, I applied a 'lock flag' to the record by locking the table/checking if the record has a lock flag and if it does not, apply a lock flag then return the record to the user. Any other user will see the lock flag and be unable to write to the record

Is this how it works in MySQL?

Thanks

Jacques1 · January 9, 2015

You can implement a “locked” flag in MySQL, yes. What's important is that getting the current status and obtaining the lock must be done in a single atomic step. For example, you might do an UPDATE query to change the status to “locked” and then check the number of affected rows. If there are no affected rows, then the entry was already locked, and the user may not proceed. If there is 1 affected row, then the lock has been obtained successfully.

Do note, however, that this kind of locking is problematic:

The entries will be locked even if the editor doesn't change any data.
In a web application, it's fairly difficult to determine whether a user is still active. So an entry may be locked even after the user has already left the PC.
Unlocking the entry may fail, in which case it will be locked permanently until you change the status by hand.

An alternative approach is to let anybody access the entry and postpone the check until the user has actually changed the data (optimistic locking). So user A can open the page even if user B has already opened it. But when user A tries to save the changes, the application checks if there were other changes in between. In that case the changes will be rejected, merged or whatever. Of course this also comes with problems:

If the records are edited very often by different people, then the changes will often be rejected.
Merging changes may be difficult.

The best approach might be a combination: You do lock the row, but you don't actually enforce the lock. You merely show a symbol and let the user decide if they want to change the data nonetheless. The data itself is protected through optimistic locking.

enveetee · January 10, 2015

@Jacques1 - many thanks for your answer, brilliant. I have another (well 1000's actually) question if you have time.

My app (PHP/MYSQL) needs credentials to connect to MYSQL database. Where/how do I store those to ensure they are secure?

Currently I have a file in a ../includes directory called connect.php, it contains my connection details. Is this safe, I am lead to believe that without access to the server these files are never 'served up' only executed.

Thanks in advance

Jacques1 · January 10, 2015

The credentials should be stored outside of the document root.

While it's generally OK to rely on .php files being executed rather than served, this kind of protection is fragile and may break due to a misconfiguration. If the webserver isn't properly set up (even if it's just for a few minutes), it may very well serve the plain source code with all the credentials in it.

Keeping internal stuff out of the document root is also a matter of clean design. The docroot is the public API of your website. Why on earth should users be able to execute some connect.php script? That's clearly none of their business. They should only see the actual pages, everything else is a “404 Not Found”.

Sign In

Multiple users reading and writing to same table - advice

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information