Preventing near simultaneous queries from returning duplicate same results

SwitzerBaden · March 16, 2010

Ahoi Freaks,

MySQL 5.0.51a-3

I have a rather large "job" table (1.8+ million entries) that has a primary auto_incremented key (id) and a unique key (name). This table is accessed via a web-service from mulitple nodes requesting jobs in rapid fire succession. Ideally it would work like this:

1. web-service makes the SELECT for node

2. web-service updates the status of the row from "ready" to "inprocess"

3. node crunches away

4. node returns results to web-service

5. web-service stores results, updates the job status back to "ready"

As the table grows, I notice that the web-service can take long enough to select and update an assigned job from ready to improcess to allow multiple nodes to select the same "next available" job...

I realize there is no way to execute a simultaneous SELECT and UPDATE, so what would be the best practice to prevent duplicate results from near simultaneous queries?

Try to update a row as improcess that matches my select criteria before it is selected, then use the id of the affected row to select? Any comments, ideas... tasty brain food recipes?

PFMaBiSmAd · March 16, 2010

It would take a specific example of your queries to directly help, but two possibilities are -

1) If the SELECT/UPDATE queries are on the same table, you can usually just perform an UPDATE query with an appropriately crafted WHERE clause and then test to make sure that the row was actually updated, or

2) You would need to lock the table(s) to insure that the values did not change between queries.

SwitzerBaden · March 16, 2010

Thanks for the reply!

Yes, this what the direction I was leaning as well since it is the same table. I would first update the row, setting the "status" to the id of the node, then just select the row that was updated. I will post if this solved the issue.

SwitzerBaden · March 16, 2010

Yes, this seems to be working as expected, thanks again! To recap:

first selecting a row, then update the selected row to prevent it being selected again causes problems with the same row being selected with large tables and/or high traffic requests. To prevent this:

First: UPDATE a single row that matches the desired WHERE clause (LIMIT 1). In my case I updated the status from "ready" to unique node_id that made the request

Second: use mysql_affected_rows() to verify that a row was updated

Third: SELECT the row that was UPDATED, WHERE status = node_id

BAM, works like a charm.

Sign In

Preventing near simultaneous queries from returning duplicate same results

Recommended Posts

SwitzerBaden

Link to comment

Share on other sites

PFMaBiSmAd

Link to comment

Share on other sites

SwitzerBaden

Link to comment

Share on other sites

SwitzerBaden

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information