Jump to content

SELECT * and other query design questions...


ajlisowski

Recommended Posts

So...dumb question.

 

Lets say I have a table that contains an id, username and full_name. And I actually do want all 3 of those columns returned to me. Is there a reason not to use SELECT * there? Is SELECT `id`,`username`,`full_name` better then SELECT * in that situation?

 

Ive been working with Zend_Framework a lot lately and Ive been basing my code on the way I have seen some MVC driven CRM systems work- where if they have a user object, whenever they retrieve that object they grab all the related data from the db. Obviously if I am running a search where I only need to return the id it would be silly to do a SELECT *. But when I am actually loading in the object, and I need every field in the table, is there a downside to SELECT *?

 

And to take it a step further, if my process is to first search for the users. Then once I have a result of the IDs for those users, to create user objects and retrieve the data into them, does it make sense to grab all the fields with the search query and initiate the user objects via that data rather then retrieving them via id?

 

Basically

 

$Q="SELECT `id` FROM `user`";

$results=$db->query($Q);

foreach($results AS $result)

{

  $newuser=new user($result->id);

}

 

 

VS

 

$Q="SELECT * FROM `user`";

$results=$db->query($Q);

foreach($results AS $result)

{

  $newuser=new user();

  $newuser->loadByObject($result);

}

 

 

In the top example you have a SELECT `id` that is ran across all results. Then X amount of SELECT * ran against a primary key.

 

In the bottom example you have a SELECT * ran against all results and no followup queries.

 

Which way is the better way to handle it?

Link to comment
Share on other sites

The SELECT * question is an ongoing debate where by some side with "If You want all the fields anyway it's fine" and other's (like myself) say "you should never use SELECT * for production applications.  Having been in the situation on several occasions where pre existing table design has had to be altered to accomidate more/less fields I don't agree with SELECT * for anything other than a quick debuging session.  That said, I REALLY hate it when people use INSERT INTO without explicitly naming every field being used. 

 

The only benifit you will ever get from using SELECT * is typing less. 

 

The better question is "Is there any reason to use SELECT * ?"

The only real answer to that is "Because I'm lazy". The same answer that is valid for those people who need to ask themselves "Why did I alias all my table names to meeningless single letters?"

 

In relation to your second question:

Why do something twice if doing it once is enough? :shrug:

Link to comment
Share on other sites

So performance wise there is no harm, but the harm is in maintaining the code. Makes sense to me.

 

As for my second question...

 

I suppose I guess my concern is...

 

does selecting more fields on a complex query matter?

 

 

Lest say it takes X seconds to run a query returning just the primary key. And you get Y results. Then it takes Z seconds to return the query on the primary key to get all the columns.

 

Your total time is X+(Y*Z)

 

Now it takes X*C seconds to run the query on all the fields where C is a factor derived from the number of fields you have. function(number_of_fields)=C;

 

In theory depending on how big C is, the second method could be signifigantly longer if X is very large. Im assuming C is almost negilable, where the amount of fields returned doesnt have that big of an impact. But I am not sure of that, hence my concern.

 

Im not sure if this question makes any sense...or is a remotely valid concern. But ive tried to present it in a number of ways to get across what I am pondering. If it doesnt make sense or flat out isnt something to worry about then feel free to call me a dummy and send me on my way :)

 

Link to comment
Share on other sites

Since most tables would be designed correctly, a majority of the queries would generally need to select all the columns, so a majority of the queries could be written using *. When you need to select only a subset of the columns from your correctly designed table, you would list only those columns. Do what is needed, depending on what data you are trying to retrieve.

 

For those of you with incorrectly designed tables, with images/files stored in the row or tables laid out like spreadsheets or tables with duplicated data in them, don't automatically use * because you will be retrieving unnecessary data that will add overhead to your script.

Link to comment
Share on other sites

@OP

 

SELECT * is just fine. SELECT * is equivalent (almost) to the PROJECTION operator in the original relational algebra.

 

Advice that warns against its use, is like advice not to use the "+" operator. I don't see what is the basis (logical) of that advice.

 

If in production SELECT * gives problems, it is not the fault of SELECT * per se, but the fault of non-updated existing code.

 

Using it is not being lazy; it is about using features that were placed there because of its practicality.

Link to comment
Share on other sites

@OP

 

SELECT * is just fine. SELECT * is equivalent (almost) to the PROJECTION operator in the original relational algebra.

 

Advice that warns against its use, is like advice not to use the "+" operator. I don't see what is the basis (logical) of that advice.

 

If in production SELECT * gives problems, it is not the fault of SELECT * per se, but the fault of non-updated existing code.

 

Using it is not being lazy; it is about using features that were placed there because of its practicality.

 

Really?  well my question still stands - what practical benefit does it actualy give you over explicit field lists?

 

Oh, and to liken it's use to the + operator is just stupid and petty - I have to assume you take offence at being grouped as lazy, but you really need a better argument than that.

Link to comment
Share on other sites

Really?  well my question still stands - what practical benefit does it actualy give you over explicit field lists?

 

Practical Argument

Case 1: Using SELECT *

a.) If you added a new field, you need NOT modify your query AND you will modify code that references such fields = 1 Edit

 

b.) If you removed/changed a field, you will modify your query AND you will modify code that references such fields = 2 Edits

 

Thus: 3 "edits." all in all.

 

Case 2: Not Using SELECT * (explicitly typing)

a.) If you added a new field, you need to modify your query AND you will modify code that references such fields = 2 Edits

 

b.) If you removed/changed a field, you will modify your query AND you will modify code that references such fields = 2 Edits

 

Thus: 4 "edits." all in all.

 

So Case 1 is more efficient by a unit (1 edit) than Case 2.

 

Now if you need to modify/add fields of 100 tables, Case 1 is more efficient by 100 units than Case 2 (100 X 1).

IMO, your question is adequately answered by now.

 

 

Oh, and to liken it's use to the + operator is just stupid and petty

"SELECT *" is SQL's implementation of the relational PROJECT operator/project (again, SQL was almost successful in implementing it).

Meaning "SELECT *" is an operator/function for type RELATION.

 

"+" is most Programming Languages' implementation of the arithmetic ADDITION.

 

Meaning, "+" is an operator/function for type RATIONAL (or REAL).

 

Thus they are the same in the sense that they are operators for a certain data type.

 

Thus, if they are used (PROJECT and +) in a computer program, in essence, they are the same: in the sense that they are operator/function invocation. They take an input and spits an output.

 

In the case of "SELECT *"/PROJECT, it takes an input of type RELATION/TABLE, and spits an ouput of type RELATION/TABLE.

 

In the case of "+", it takes an input of type RATIONAL(REAL), and spits an ouput of type RATIONAL(REAL).

 

 

That is the sense in w/c I "likened" them, FYI. All this is because of relational algebra, not me.

 

If you have a problem w/ that, you can always air your complaints to the authors of relational algebra.

 

I have to assume you take offence at being grouped as lazy, but you really need a better argument than that.

No I don't.

I'm a technical guy, so I like being lazy. If there is an efficient way to do things, I will do it, so that my employer will be happy,

and he will pay me more. If laziness == more profit then I like being lazy.

 

I hope that is a better argument.

 

Thank you and hope it helps.

Link to comment
Share on other sites

Ah, there was a fundamental error in my answer:

 

It says:

Case 1: Using SELECT *

a.) If you added a new field, you need NOT modify your query AND you will modify code that references such fields = 1 Edit

 

b.) If you removed/changed a field, you will modify your query AND you will modify code that references such fields = 2 Edits

But really:

 

Case 1: Using SELECT *

a.) If you added a new field, you need NOT modify your query AND you will modify code that references such fields = 1 Edit

 

b.) If you removed/changed a field, you need NOT modify your query AND you will modify code that references such fields = 1 Edit

 

Thus: 2 "edits." all in all.

 

Case 2: Not Using SELECT * (explicitly typing)

a.) If you added a new field, you need to modify your query AND you will modify code that references such fields = 2 Edits

 

b.) If you removed/changed a field, you will modify your query AND you will modify code that references such fields = 2 Edits

 

Thus: 4 "edits." all in all.

 

So Case 1 is more efficient by two units (2 edits) than Case 2.

Now if you need to remove/modify/add fields of 100 tables, Case 1 is more efficient by 200 query-edits than Case 2 (100 queries X 2 edits).

IMO, your question is adequately answered by now.

Link to comment
Share on other sites

For those of you with incorrectly designed tables, with images/files stored in the row or tables laid out like spreadsheets or tables with duplicated data in them, don't automatically use * because you will be retrieving unnecessary data that will add overhead to your script.

 

W/c means that it is not the fault "SELECT *" but of "incorrectly designed tables."

 

Of course, avoiding "SELECT *" in those cases will make "sense". But that is merely a "workaround" w/c ought to be temporary.

 

It's most practical if we do not address the effect of a problem but the cause of it (the problem itself).

Because by then further problems are avoided because the main problem was addressed.

 

In short, DB design also plays a role here.

Link to comment
Share on other sites

so your practical benefit is...saving typing.  shock horror. 

 

Efficiency is measured by more than how many words you need to change during design.  And it's invalid to assume that by adding a field to a table you are going to want to refference that information in all pre existing code.  As for having to update 100 tables at once...Really? You had to add 1 single field to 100 tables at the same time and then refference that field with every single pre existing query?  dont make ludicrous scenarious just to keep from seeming quite such an opiniated fool.  Likely you would update 100 tables and then waste your dear efficiency by using * to pull that field irrelivently into your pre existing queries that were never designed to refference them into variables anyway.

 

It's not the fault of algebra: + performs an arithmetic action, * just cuts corners.

Link to comment
Share on other sites

so your practical benefit is...saving typing.  shock horror. 

Why exactly is it a "shock horror"?

 

You see, the designers of SQL introduced "*" as a convenient shorthand to reference all the attributes of a relation (or table reference).

As such, it is practically meant to reduce typing.

 

No other benefit should be expected to be derived from it, as intended.

 

If the optimizer of a DB is reliant upon such construct to improve performance, well that is it's "limitation", w/c ought to be improved

in future releases.

 

Any "dogma" w/c says to avoid the "*" construct is contingent upon the current state of affairs of optimisers. If in the future

optimisers are improved such dogma will no longer be true.

 

I understand of course that we are limited with what we have. Thus avoiding the "*" is justified for performance reasons; but only that,

in essence they are not "evil" or some plague that ought to be avoided. Logically speaking.

 

 

 

Efficiency is measured by more than how many words you need to change during design.

What exactly are you referring? The efficiency I was hinting is quantifiable. Thus I presented the "typing that can be avoided."

 

And it's invalid to assume that by adding a field to a table you are going to want to refference that information in all pre existing code.

As for having to update 100 tables at once...Really? You had to add 1 single field to 100 tables at the same time and then refference that field with every single pre existing query?

Well, I was sorry for not stating my assumptions. IMO, such assumption is obvious. Why would you want to add a field

if it will not be referenced (one way or another)?

 

  dont make ludicrous scenarious just to keep from seeming quite such an opiniated fool. 

Why is it ludicrous? Did you experience all scenarios that every developer in this universe did experience to be able to say

that such scenario is ludicrous? I doubt that.

 

 

Likely you would update 100 tables and then waste your dear efficiency by using * to pull that field irrelivently into your pre existing queries that were never designed to refference them into variables anyway.

Again, Why would you want to add a field if it will not be referenced (one way or another)?

 

It's not the fault of algebra: + performs an arithmetic action, * just cuts corners.

You have not grasped what I meant, and thus it would be a waste to explain it again. Understand it first before making any

opinions about it. That would be wise.

 

Thank you and hope it helps.

Link to comment
Share on other sites

Just to clarify.

 

I am not suggesting to use "SELECT *" in all of your queries.

 

If you need only 2 fields of a table w/ 150 fields, why would you do so, right?

 

But, if you need all those fields (w/c is possible in many applications), it is not logically incorrect to use "SELECT *"

rather than typing those field names one-by-one (w/c is a cumbersome undertaking).

 

If there is a performance penalty in using "SELECT *", that is not the fault of "SELECT *" as such, but of the current optimizer

of the database. "SELECT *" is a mere syntactic construct w/ is in the end processed by the DBMS's optimizer.

Some DBMS might handle it alright, and some might not.

 

So, the dogma that "SELECT * is evil (performance-wise)" is not absolute (i.e., it is false in many other instances).

 

Caveat lector. Caveat emptor.

Link to comment
Share on other sites

Let's do this your way

 

Why exactly is it a "shock horror"?

...

I understand of course that we are limited with what we have. Thus avoiding the "*" is justified for performance reasons; but only that,

in essence they are not "evil" or some plague that ought to be avoided. Logically speaking.

 

Shock horror is that the best argument that you can give is simply making my point.  The only reason to use * is to be lazy with your coding.  It's not the fault of optimisers, it's primarily the fault of those who are new to SQL, learn using * from the beggining and then cause problems because there is no controlable ferrerence to the existing fields in the table that they are using for the query.  So when that code then get's passed onto someone else to "make it better" there is no reffernce point to work with.  It's grossly harder to manage code that uses a * than the explicit field names.  It's the same problem with the single letter aliasing of table names - it starts off simple, grows, then someone else gets landed with it and has to keep scrolling back up to see what bloody table "d" was.

 

What exactly are you referring? The efficiency I was hinting is quantifiable. Thus I presented the "typing that can be avoided."

I was reffering to the performance degridation of selecting fields that are not required, as well as the time and wasted while other developers work trying to exact the relevent code from a clearly non modular style.

 

Well, I was sorry for not stating my assumptions. IMO, such assumption is obvious. Why would you want to add a field

if it will not be referenced (one way or another)?

Generaly if a system is operating and then has to be modefied, it will be because there are additional functions being added, not because the ones that have been in use are being altered.  Thus adding fields (even to 100 tables) is generaly to be expected as a process that will enable the construction of additioanl queries, not re-invent the ones that are still working.

 

Why is it ludicrous? Did you experience all scenarios that every developer in this universe did experience to be able to say

that such scenario is ludicrous? I doubt that.

Universe? Seriously? I don't think we need quite so much drama thanks.  Obviously no I have not come anywhere close to dealing with even a fraction of every possible permutation of "What could happen if".  But the scenario was ludicrous in the effect that if it was an accurate representation of what was expected, it was clearly done wrong to start with.  In which case a ground up review would be needed and the use of * would fall somewhere into meeninglessness (is that even a word?) in comparison to the rest of the work that would need to be done.

 

Again, Why would you want to add a field if it will not be referenced (one way or another)?

Again, predominantly to be reffrenced by other, new queries.

 

You have not grasped what I meant, and thus it would be a waste to explain it again. Understand it first before making any

opinions about it. That would be wise.

 

You'll find that happens quite a lot with me  ;D I had assumed that you were reffercing algebra for something more that "what goes is comes back out".

 

My issue with the use of SELECT * isn't as eclectic as I have probably made it seem.  It's more like how I veiw high grade explosives - should only be used by people who know exactly what they are doing with it.  Too many people pick it up early and run with it forever more regardless of whether it is the right thing to do.  I meen come on, how many times have we all seen SELECT * used in a user login script on this forum alone?  Then how many times when people want to see a list from table1 that reffrences a join on table2, looking for a single pair of matched fields but wind up using select * to pull everything out of both?

 

If someone is in the position where they know exactly what they are doing and how it is being done, then sure use SELECT * - otherwise you are probably doing it wrong.

Link to comment
Share on other sites

The only reason to use * is to be lazy with your coding.

Again, if being lazy translates to efficiency (saved time, money, effort) then I will go w/ it.

 

It's not the fault of optimisers, it's primarily the fault of those who are new to SQL,

learn using * from the beggining and then cause problems because there is no controlable ferrerence to the existing fields in the table that they are using for the query

There you have it. Remember that "SELECT *" is merely an innocent syntactic construct.

It is the user who is responsible. Your comments are giving the impression that "SELECT *" is inherently

"evil", where in fact it is the user's fault (not knowing when to use it).

 

That is my issue: using SELECT * is not "bad/evil"; if you know what you're doing, of course, you'll be fine.

 

But then again, IF there are optimisers who just can't optimise code that contains "SELECT *", well they are to be

blame too.

 

So when that code then get's passed onto someone else to "make it better" there is no reffernce point to work with.  It's grossly harder to manage code that uses a * than the explicit field names.  It's the same problem with the single letter aliasing of table names - it starts off simple, grows, then someone else gets landed with it and has to keep scrolling back up to see what bloody table "d" was.

Again, NOT the problem of "SELECT *", as such, but of lack of code documentation and all such matters.

 

I was reffering to the performance degridation of selecting fields that are not required, as well as the time and wasted while other developers work trying to exact the relevent code from a clearly non modular style.

This was addressed in my follow-up post, immediately before this. Please read it.

 

 

Universe? Seriously? I don't think we need quite so much drama thanks.  Obviously no I have not come anywhere close to dealing with even a fraction of every possible permutation of "What could happen if".  But the scenario was ludicrous in the effect that if it was an accurate representation of what was expected, it was clearly done wrong to start with.  In which case a ground up review would be needed and the use of * would fall somewhere into meeninglessness (is that even a word?) in comparison to the rest of the work that would need to be done.

I used the term "universe" because a pun was intended. However, it seemed not to work.

Kidding aside -- but still, such scenario IS still possible, right? And besides, there are many scenarios (real)

w/c are close to what was being conjured (say, 10 tables that needed to be modified, and 3 fields to be added to each).

 

You'll find that happens quite a lot with me  I had assumed that you were reffercing algebra for something more that "what goes is comes back out".

Read the article on wikipedia (relational algebra). You can also purchase a book by Chris Date (Introduction to Databases).

 

 

My issue with the use of SELECT * isn't as eclectic as I have probably made it seem. 

It's more like how I veiw high grade explosives - should only be used by people who know exactly

what they are doing with it. 

Too many people pick it up early and run with it forever more regardless of whether it is the right thing to do. 

I meen come on, how many times have we all seen SELECT * used in a user login script on this forum alone? 

Then how many times when people want to see a list from table1 that reffrences a join on table2,

looking for a single pair of matched fields but wind up using select * to pull everything out of both?

 

If someone is in the position where they know exactly what they are doing and how it is being done, then sure use SELECT * - otherwise you are probably doing it wrong

 

Agreed.

 

 

But my issue, again, is that it is often viewed that "SELECT *" in itself is "bad/evil/dangerous".

But in reality it is the user who is to be blamed for misusing it. I hope that that point is made clear: not the syntax

(SELECT *) but the user (and/or limited DB optimiser) is ultimately responsible for the undesirable side-effects of using

SELECT *.

 

The solution, therefore, is not to ban the use of "SELECT *" but to educate the user of its (SELECT *) proper use.

 

It can still be used but only when it's applicable, rather than type the whole set of fields (w/c can be 20 fields w/ average of 6 characters each: daunting and inefficient!)

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.