Jump to content

What determines the order of insert processing sequence?


greenace92

Recommended Posts

Awful thread title but what I'm asking about I think pertains to race condition. Well in particular I'm concerned about targetted efforts to bring down a server or in general how to handle so many requests.

 

Say you have a basic form, has three fields, name, email, message.

 

Let's say 10 people are in the same room, all 10 have filled out the form, all 10 submit at the same time. What part of the server/applicatin determines who gets inserted first? Do I specify scenarios that handle queue's to prevent potential data loss?

 

I'm thinking that I should cache inserts somewhere in case of a failed connection or waiting due to too many people submitting.

 

I'm running a LAMP server with MySQL as database. I'm not sure if I should look at apache.config file or php.ini or both... Should I have things in mind. How can I test my server to see how many concurrent connections/inserts it can support, etc...

 

At the moment I'm keeping track of the client's ip (I realize they could bounce ip's), then the time of each insert to limit. This pertains more to public demonstration web applications, as opposed to one that uses registered users. I could also implement a recapctha which would slow the person/attacker/script down. I'm not sure about the script part with a re-captcha.

 

I'd appreciate your thougths or any good reading material (the manual ha).

 

Thanks for your time.

Link to comment
Share on other sites

The first answer is, FIFO.  The first person to hit submit will be handled first.  Hypothetically, yes there could be simultaneous requests.  Could there be an overlap?

 

A properly designed table in a properly designed database already takes care of this.  

 

All tables should have a primary key.  The primary key is guaranteed unique.  If you're not sure, then declare the PK for your table to be an unsigned INT, and use the mysql AUTO_INCREMENT feature when you set up the table, and concurrency will be handled for you by the database.

 

As to how many inserts you can do, there are load testing tools you can find by googling.  Just to start, there is a way to do a simple form post using Apache Bench:  http://craigwickesser.com/2015/01/post-data-to-load-test-with-apachebench/

 

If you're good with javascript and familiar with Node/NPM then loadtest is similar and easy to whip up a quick form POST by passing a little json string as an argument.

 

There's jmeter, siege, and really many many tools out there you can explore.

 

There's also quite a lot to performance tuning a LAMP environment.  I will say, that for maximum concurrency, Apache is really not considered one of the best options these days, as Nginx combined with php-fpm has become a lot more popular when people are trying to maximize web performance with php/mysql.  You also have to understand your available resources and how they are allocated depending on the available hardware and architecture you have to throw at your problem.  On a monolithic server, when you have to share memory, IO etc, there are contention issues and all sorts of problems that can popup.  Every HTTP connection requires memory, your scripts require memory, the database requires memory, and when persisting to disk, IOPS.

 

Entire books have been devoted to this subject, and there are no simple answers that don't also require an understanding of the application and how and why it was deployed.

Link to comment
Share on other sites

Wow thank you so much gizmola, great post.

 

I do create my databases/tables entirely with PHPMyAdmin and whenever I set an id column, I use Index (this seems to have solved the no index used error) and set auto-increment. This "overlap", I don't know if that's physically possible, but I'm concerned about it. If two people hit enter at the same time, down to the microsecond (that small?) I'm not sure if it's possible to overlap. Maybe one will always be there first depending on connection, computer, os, etc...

 

I know that LAMP is sort of old, I had asked a question if LAMP was outdated and eventually I believe I'm looking to setup a LEMP stack with NGINX and Postgresql.

 

I looked up php-fpm, I see this "FastCGI Process Manager" what's a simple example of what that would be used for?

I haven't had any web-applications made public yet, that actually has users, so I don't have any experience/problems with load.

I've also considered implementing secondary overflow cloud servers to handle load if there were any problems like that.

 

I haven't worked with Node/NPM but I use javascript quite a bit.

 

I think I just got my first taste of JSON with implementing Google's re-captcha on a website.

 

Thank you for the link and the new words for me to learn.

 

I rent a cheap single-core-10GB-storage-1GB-ram VPS, so I guess I'll have to see what specs I can get out of it.
 

Link to comment
Share on other sites

Whoa. You're afraid of 10 requests? Aren't you aware that LAMP powers huge applications with thousands of concurrent requests? That's what a webserver is for!

 

If this is your very first (personal?) website, you should actually be more worried about having too few users.

Link to comment
Share on other sites

Hey Jacques1! Good to see you on phpfreaks haha.

 

Well I haven't actually had any users yet on any of my sites. I haven't had any up for the longest time, revisions, revisions, revisions.

 

I'm concerned about things... the number 10 was just an easy to get across point I guess.

One of my current projects is a calendar that I built with PHP/Javascript and I am putting it on my portfolio/projects in a way where it's usable.

 

I don't know if this is a bad design but each tile or day has a unique id and when one tile is clicked, this calls an AJAX request to refresh the entries for this day and attach them to the content panel which shows entries for the selected date. When a person writes on the same panel and hits save, this saves the entry, and again pings for new entries, adding the newest added entry.

 

Since this is a demo, one thing I did was to not allow text to be entered at will, rather each post only writes a specific set of words like "sample" or "demo". I then thought, I should limit the number of times someone can hit post. They could just keep hitting post over and over. With each post, the inputs/textarea are cleared. But in this case, instead of placeholders, the placeholders went directly as values so a person could actually keep hitting enter over and over.

So what I'm trying to do at the moment, is to grab their ip when submitting, storing that ip in a table and when a person submits, check to see if that ip has posted, has it been 5 minutes since they last posted, and each ip can only post 4 times. 4 is a random number. I mention this in the projects page where people can see my work.

 

I don't know... I guess I'm paranoid because I don't know what's going on. Blind

Link to comment
Share on other sites

I don't want to compare it to a DDOS attack. In this particular instance, a public form submission without registering an account or captcha seems like a vulnerability to me. I also implemented the fixed-input so that I don't get any "funny guys" typing dumb things over and over again for others to see. I'm supposed to look like I know what I'm doing instead of being spoon fed knowledge. :sweat:

 

So the limiter which limits a person from posting too many times in a row is a server load safety net I guess, I don't know. As I said I haven't really had any users before. I have to try that load test to bench mark what my server can handle as far as traffic goes and concurrent submissions and whatever. I've seen things in I think both the Apache config and PHP.ini that specify things like "max connections" and "keep alive" I'll have to look into those further.

Link to comment
Share on other sites

Nobody above the age of 12 will manually resubmit a form in the hope of overloading your server. This just doesn't work. DoS attacks are automated and can be as simple as a flood of GET requests. There's not much you can do about that except hiring a company to overpower the attacker (which costs money).

 

Also, one IP address does not equal one person! Attackers typically have large pools of IP addresses (think of botnets), legitimate users may share their IP address with thousands of other people. So if a single big proxy server or VPN ends up on your blacklist, you've pretty much DoS'ed yourself.

 

If you want to protect your form against spam, use a CAPTCHA like everybody else. Besides that, you should relax and try to approach this more rationally. Don't jump to countermeasures when you don't even know what you're trying to prevent.

Edited by Jacques1
  • Like 1
Link to comment
Share on other sites

Nobody above the age of 12 will manually resubmit a form in the hope of overloading your server. This just doesn't work. DoS attacks are automated and can be as simple as a flood of GET requests. There's not much you can do about that except hiring a company to overpower the attacker (which costs money).

 

Also, one IP address does not equal one person! Attackers typically have large pools of IP addresses (think of botnets), legitimate users may share their IP address with thousands of other people. So if a single big proxy server or VPN ends up on your blacklist, you've pretty much DoS'ed yourself.

 

If you want to protect your form against spam, use a CAPTCHA like everybody else. Besides that, you should relax and try to approach this more rationally. Don't jump to countermeasures when you don't even know what you're trying to prevent.

 

Dang. Well said again. The form part I figured a script that requested a submit/post request over and over again with the same data. But the captcha kills the argument. So I guess I will just come up with an overlay-pop up then where the form can be of sufficient size to have the re-captcha. I'll probably implement the time thing else where. Not sure for what.

Link to comment
Share on other sites

I tend to agree with Jacques on this front.

 

You are clearly very interested in web development, and are doing a lot of thinking about your design and looking into how things work.  

 

Just to go back to the original question, ultimately apache WILL serialize/queue the requests.   They will be processed and there will not be a problem when you're talking about inserts.  

 

In your app, what could possibly happen is that

 

person A submits

person B submits at same moment.

 

person A sees person B's entry and wonders what happened.

 

 

But that is the nature of the design of your calendar app, and really, for a proof of concept is not worth worrying about.

 

Rather than fret over design, you should concentrate on building things that work.  In the process you will learn many things.

 

When you have a question in regards to how something works,, certainly it's good to ask questions, and even better to create small tests that explore the areas you are interested in.

 

 

Try to avoid overthinking things, and concentrate on expanding your capabilities, and confront concerns when they crop up directly rather than trying to create what you think is the perfect application.  

Link to comment
Share on other sites

person A submits

person B submits at same moment.

 

person A sees person B's entry and wonders what happened.

 

That right there. What am I supposed to do there? That doesn't seem acceptable to me, like "Too bad person B".

 

I want to implement something that monitors the inputs and makes sure that no-one's post is lost.

 

I am seeing the value in dealing with the most urget/necessary matters first before other hypothetical what-if scenarios.

 

Thanks for the help, everyone.

Link to comment
Share on other sites

I think you misunderstand gizmola's post (as well as the previous explanations).

 

Nothing gets lost. If 10 people submit the form at the same time, you get 10 new records (I'm excluding errors, bugs and server crashes here). There is no problem. The requests will not overwrite each other or anything like that.

 

The only effect of concurrent submissions is that a user might see both his own entries and the new entries of other users. But that's the whole point of a shared calendar, isn't it? If you don't want this, don't use a shared calendar. Or lock it while it's being edited by one user (probably a bad idea).

 

Again, there is no problem. Concurrency can become an issue if multiple persons edit the same resource (e. g. change the text of an entry), but that's not the case here.

Link to comment
Share on other sites

I think you misunderstand gizmola's post (as well as the previous explanations).

 

Nothing gets lost. If 10 people submit the form at the same time, you get 10 new records (I'm excluding errors, bugs and server crashes here). There is no problem. The requests will not overwrite each other or anything like that.

 

The only effect of concurrent submissions is that a user might see both his own entries and the new entries of other users. But that's the whole point of a shared calendar, isn't it? If you don't want this, don't use a shared calendar. Or lock it while it's being edited by one user (probably a bad idea).

 

Again, there is no problem. Concurrency can become an issue if multiple persons edit the same resource (e. g. change the text of an entry), but that's not the case here.

 

I understand. Sorry that took so long. I didn't mean to imply that I cared about people seeing other people's posts. This is a demo with fixed input and also if I was to have users, the calendar would be governed by session id/user name. But my concern which you guys have addressed was the concern of I think "race-condition", well somebody's post get's skipped or ignored.

 

Thanks for the clarification.

Link to comment
Share on other sites

That's not really how a race condition happens. A race condition is going to be a concern mostly when considering something using controlled, unique indices - consider user signup where you've got a system that runs a select query with the requested username supplied by a user, then inserts that username if the select query doesn't return a result. Basically, in this case user A and user B request the same username at almost the same time, both select queries will report that the username is available (ie, not already in the database table). Then user A's insert query is run and suddenly that username isn't available to user B despite the select query - quite correctly - reporting that is was available just a second ago. Or, if there's not a unique constraint on the username field in the database, suddenly you have the same username associated with two distinct records.

 

What you're describing is a simple insert query. There's no select to check the availability of any index, so no race condition can occur. As everyone has said in this thread, if user A and user B insert a new record at the same time, the worst that's going to happen is that both records will be inserted, which is honestly kind of the point of the thing to begin with. There's no unique constraint (either defined or implied by the business logic) to make this an issue.

Link to comment
Share on other sites

That's not really how a race condition happens. A race condition is going to be a concern mostly when considering something using controlled, unique indices - consider user signup where you've got a system that runs a select query with the requested username supplied by a user, then inserts that username if the select query doesn't return a result. Basically, in this case user A and user B request the same username at almost the same time, both select queries will report that the username is available (ie, not already in the database table). Then user A's insert query is run and suddenly that username isn't available to user B despite the select query - quite correctly - reporting that is was available just a second ago. Or, if there's not a unique constraint on the username field in the database, suddenly you have the same username associated with two distinct records.

 

What you're describing is a simple insert query. There's no select to check the availability of any index, so no race condition can occur. As everyone has said in this thread, if user A and user B insert a new record at the same time, the worst that's going to happen is that both records will be inserted, which is honestly kind of the point of the thing to begin with. There's no unique constraint (either defined or implied by the business logic) to make this an issue.

 

 

That is a good description and even in that circumstance, there is a solution for it with MySQL ... using a transaction along with a LOCK TABLE ... WRITE

 

Of course for most systems the chance that 2 people will try and choose the exact same username at the exact same moment in time is close to nil, so it's much better simply to have the unique index/constraint on username, and handle the constraint violation error that will be thrown.

Link to comment
Share on other sites

That's not really how a race condition happens. A race condition is going to be a concern mostly when considering something using controlled, unique indices - consider user signup where you've got a system that runs a select query with the requested username supplied by a user, then inserts that username if the select query doesn't return a result. Basically, in this case user A and user B request the same username at almost the same time, both select queries will report that the username is available (ie, not already in the database table). Then user A's insert query is run and suddenly that username isn't available to user B despite the select query - quite correctly - reporting that is was available just a second ago. Or, if there's not a unique constraint on the username field in the database, suddenly you have the same username associated with two distinct records.

 

What you're describing is a simple insert query. There's no select to check the availability of any index, so no race condition can occur. As everyone has said in this thread, if user A and user B insert a new record at the same time, the worst that's going to happen is that both records will be inserted, which is honestly kind of the point of the thing to begin with. There's no unique constraint (either defined or implied by the business logic) to make this an issue.

 

Thanks for the explanation.

 

That is a good description and even in that circumstance, there is a solution for it with MySQL ... using a transaction along with a LOCK TABLE ... WRITE

 

Of course for most systems the chance that 2 people will try and choose the exact same username at the exact same moment in time is close to nil, so it's much better simply to have the unique index/constraint on username, and handle the constraint violation error that will be thrown.

 

Regarding preventing the same username being used. What if I were to "pretend" a username someone typed in was not able say while it's in the username text input field and if the person decided against that name (backspaced) that name is removed from the "being considered" table. Sounds nuts. If this is again, "GO READ THE F------ MANUAL" then please don't respond I'll find out eventually. I keep track of these responses and look stuff up/put into a future research log when I start to implement what I'm asking about.

Edited by greenace92
Link to comment
Share on other sites

“Reserving” the username while the field is being filled out is indeed a bad idea. If you want live validation of the username field, do it in two steps:

  • Pre-check the current input with a simple SELECT. This isn't perfectly reliable (somebody else may still take the name), but it lets the user avoid names that are already registered at that point.
  • When the user submits the form, try to insert the data and catch violations of the UNIQUE constraint imposed on the username field. If such a violation occurs, somebody else has grabbed the name, and the current user needs to start over.

A more relevant race condition in your calendar application would be multiple users editing the same entry at the same time. Only the last update will take effect while all others get lost. A common solution to this problem is optimistic locking: You keep a version number for every entry. When the user starts editing, you load the current entry as well as the version number. When the user tries to update the data, you simultaneously check if the version numer is still the same. Only then do you change the data and increment the version number. Otherwise you reject the changes and ask the user to check the new data and try again.

Link to comment
Share on other sites

“Reserving” the username while the field is being filled out is indeed a bad idea. If you want live validation of the username field, do it in two steps:

  • Pre-check the current input with a simple SELECT. This isn't perfectly reliable (somebody else may still take the name), but it lets the user avoid names that are already registered at that point.
  • When the user submits the form, try to insert the data and catch violations of the UNIQUE constraint imposed on the username field. If such a violation occurs, somebody else has grabbed the name, and the current user needs to start over.

A more relevant race condition in your calendar application would be multiple users editing the same entry at the same time. Only the last update will take effect while all others get lost. A common solution to this problem is optimistic locking: You keep a version number for every entry. When the user starts editing, you load the current entry as well as the version number. When the user tries to update the data, you simultaneously check if the version numer is still the same. Only then do you change the data and increment the version number. Otherwise you reject the changes and ask the user to check the new data and try again.

 

Thanks for this tip and the bit about checking username existence. I'll write these down and use it for future implementations.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.