Jump to content

Limit 1


benanamen

Recommended Posts

A few times I have seen in the case of SELECT username WHERE user_id=? there is LIMIT 1.

 

My immediate thought was, why add the LIMIT 1 since there should only be one result anyways until I saw an explanation somewhere else which said it will stop MySQL from continuing searching the DB for more matches. A light bulb went off because this made sense to me.

 

My question is, is that really the case? And if so, are we talking enough of a performance gain for it to even matter? (Lets consider using something as big as Facebook).

 

If that is the best practice, I am surprised I almost never see it.

Link to comment
Share on other sites

I'll agree with you. If have many records should have an index and would be using that.

 

Since you mentioned Facebook I'll tell you what I discovered.

Is many big sites like Facebook using big tables and NoSQL like cassandra.

Facebook created that and open sourced it, I read they stopped using it in 2014 and not sure what are using now.

They use a combination of methods to get data.

Structured important data such as users info, wall posts,  timeline and so on are with mysql and replicated across various data centers.

Memcached is used to cache a lot of data and objects into ram.

Haystack to handle a http based photo server. Info from facebook.

Scribe to handle logging.

Varnish http accelerator that acts like a load balancer and cache.

Hiphop php transpiler but I believe using virtual machines...HHVM.

Link to comment
Share on other sites

A few times I have seen in the case of SELECT username WHERE user_id=? there is LIMIT 1.

 

 

If user_id is indexed I suspect it makes very little difference. OTOH, if it isn't indexed and the whole table would require scanning, then there may be an advantage in the LIMIT 1 depending on the target record's position in the table.

 

The query in question is selecting "username" based on a condition for the "user_id". "user_id" should, in most cases. be indexed. But, a more relevant use case would be a logon script that is going to search based on the "username" which, other than logon, would probably not normally be used for any conditional matching. So, that would be a valid case to use LIMIT 1.

 

Unless I am missing something, I would look at it this way: Using it where not needed (i.e. criteria field is indexed) would have no negative affects, but using it where it is needed would have positive performance effects. So, it could be a good practice to use it whenever it is applicable. But, be sure to account for scenarios where you only expect one record, but where there are edge case scenarios where multiple records could be created.

Link to comment
Share on other sites

My bad, I meant WHERE username and the use case being a login and only one possible username match. user_id would have been the indexed auto increment which is not the column the WHERE would be checking.

 

Then a LIMIT 1 would make sense. The username field would likely not be indexed if there is a user_id field used for associations.

Link to comment
Share on other sites

The username should be indexed. Otherwise the database system has to perform a full table scan whenever somebody logs in.

 

In fact, it sounds rather silly to omit the index and then use LIMIT 1 to “improve performance”. If you care about performance, use an index. This will be much, much more effective than any LIMIT hack.

Link to comment
Share on other sites

The username should be indexed. Otherwise the database system has to perform a full table scan whenever somebody logs in.

 

In fact, it sounds rather silly to omit the index and then use LIMIT 1 to “improve performance”. If you care about performance, use an index. This will be much, much more effective than any LIMIT hack.

 

I don't necessarily disagree with you. But, indexing has it's own overhead (more memory consumption, slower write operations, etc.). Depending on the nature of the application, a user may log in once and then do many, many other operations. The login call may be so infrequent and so low in performance impact that adding an index for that one use case may do more harm than good. In any case, there should be proper logging to capture the frequency and duration of particular processes to determine where performance improvements should be made. I agree using "LIMIT 1" for the purpose of performance is a "hackish" solution. I just don't see any downside to using it.

Link to comment
Share on other sites

Ok, I did some testing on 300,000 employee records. Based on my test results, the fastest option is set an index and no limit 1

 

 

No Index  Average .97
.0554
1.142
1.155
1.094
.700
1.153
1.143
 
No Index Limit 1  Average.38
.423
.354
.388
.348
.412
.405
.333
 
 
Indexed  Average.026
.046
.026
.022
.020
.019
 
Indexed Limit 1 . Average 031
.020
.029
.042
.038
.042
.021
.029
Edited by benanamen
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.