Jump to content

Looking for easier mysql code..It takes way too long to conclude.


Recommended Posts

This works great when there are 100 dealer records but crashes at 6000 dealer records.  Does anyone know why or how I can redevelop this query so that it will search faster.  As you can tell I am no expert with mysql.  Thanks in advance for your help.

 

 

$Dealer_Results->DataSource->CountSQL =

"SELECT Count(Distinct Dealer_ID) as Count

FROM  ((dealer_category_matrix RIGHT JOIN dealers ON dealers.Dealer_ID =

dealer_category_matrix.DCM_Dealer_ID) LEFT JOIN category ON dealer_category_matrix.DCM_Cat_ID = category.Cat_ID) INNER JOIN zipcode ON zipcode.ZIPCode = dealers.Dealer_Zip WHERE  (category.Cat_ID='" . $Category . "' OR ''='" . $Category . "') AND (dealers.Dealer_Publish='Y') AND (dealers.Dealer_Approved='Y') AND (CURDATE() < dealers.Dealer_Expires) AND (( 3963 * acos( sin( '$rrr' / 57.2958 ) * sin( zipcode.latitude / 57.2958 ) + cos( '$rrr' / 57.2958 ) * cos( zipcode.latitude / 57.2958 ) * cos( zipcode.longitude / 57.2958 - '$sss' / 57.2958 ) ) ) < " . $Radius . ") AND (dealers.Dealer_BusinessName LIKE '%" . $BusinessName . "%') AND (dealers.Dealer_State = '" . $State . "' OR '0'='" . $State . "0');";

The best solution to something like this is to get one of the queries and use EXPLAIN EXTENDED query

 

Then take a look at what it is showing you is the problem area of your query, and look to either rewrite that, add an index or modify an existing one (often people will try and create a "covering" index that insures the most important columns have an index on the columns required,  and in general stop large table scans or the creation of temporary tables, or in some cases a hint to the query if it's joining in a way that's non-optimal.

 

There are lots of examples out there of how to read the EXPLAIN EXTENDED output.  If you manage to get one for one of these queries and you want to include it here I might be able to provide you some tips.  One thing I can tell you is that with all the math you are doing on columns, no indexes will be used for any of the columns that are part of those calculations, so that could be one of the obvious areas of slowness, but again the query plan should show you what is eating up your time in terms of number of rows being looked at.

 

 

My apologies for being uneducated in mysql it was hard enough learning/understanding php.  I was under the impression that mysql worked like if statements.  IF > Category ON TABLE CATEGORY is True THEN go to INDEX ON TABLE DEALERS ( of course Dealer_ID would be the same ) and do the calculation. RETURN result.  Isnt that how it works?  If not then maybe I should if possible do a query for the categories first then do a query withe the TRUE results for the Category search then use those results to do the Math on the DEALERS Table.  Is that even possible?

Thanks again for the reply.

Don,

  No, SQL is non-procedural.  It actually works with set theory (intersection, union, cartesian product, etc) with the most fundamental concept being that it will always return a set.  There are some small procedural additions, and most of the rdbms's have support for stored procs and triggers, but those are advanced features that aren't pertinent to your question.  When you start out, all you're really suppossed to be concentrating on is how to get a final result set that provides the data you're looking for, perhaps sorted and grouped.

 

In a way however, thinking about how the final result set might be arrived at in the way you described isn't that off track, because the query engine has to take your SQL, parse it, and determine a query plan.  This is what the EXPLAIN EXTENDED feature gives you insight into. 

 

There are some concepts you might want to look into that help a lot in understanding how mysql works:

 

-Joins between related tables should be on keyed columns.

-If a column appears in a where clause and it is significant to limiting the result set, then that column needs to be indexed/keyed.

 

The main index type of all relational databases is a balanced b-tree index.  There are some databases that will substitute a hash index.  In either case, the property of these indexes is that they locate a particular value quickly, even when the total universe of values is very large.  In order for these to work effectively, the values need to be "high cardinality". 

 

So for example if you have 10,000 rows in a table, and you have a column called tbltype that is one of 3 possible values (a, b, c) chances are an index on that column will not be useful because it's going to be low cardinality.  The engine will decide that it's better to just scan through the entire table sequentially looking for the appropriate value.  Usually if there are queries like that, there is some other column as part of the criteria that is high cardinality that will already provide a minimal subset of rows.

 

The other thing to look out for is doing computation on a column using the built in functions, as you're doing in your query above with the math computations.  Since those are derived columns, there is no way for mysql to use indexes if it might otherwise be important to do so for subselecting the table, because the computation is not static.  So when you have a WHERE somefunc(columnA * columnB)

 

It is difficult for me to just look at a query, without having the specific information about the schema and indexes, and offer much advice on a slow performing query.  However, again if you can bag one of these slow running queries (just echo the query out in an html comment, grab it, and then run the EXPLAIN EXTENDED query in phpMyAdmin or mysql command line client) and copy and paste the result back here, I might be able to offer you some specific insight and suggestions on how to improve it.

Putting functions in your where clause will also take a very long time.  You might try, if you can, calculating the range outside of the query.

 

Also, be sure you're using your indexes well - the functions won't, and your "LIKE '%..." won't either.  If you can avoid putting the wildcard at the beginning of your "LIKE" clause, and index that field, you'll see an improvement as well.

Thanks for the reply.  Havent ran the EXPLAIN yet but how would I do calculations outside the query if the calculations return the result i need.  ( It calculates distance based on lat and long. )  I blew out my video card and my system is running in safemode so i cant run wamp yet to access mysql.  Anyways if you have an idea I definetly will attempt and listen.

Hi

 

While using complex calculations is always going to have an impact (as it is going to have to do that calculation on every row), that calculation isn't that unusual.

 

I would be concerned that you are maybe joining on fields that are not key or index fields.

 

All the best

 

Keith

Hi

 

While using complex calculations is always going to have an impact (as it is going to have to do that calculation on every row), that calculation isn't that unusual.

 

I would be concerned that you are maybe joining on fields that are not key or index fields.

 

All the best

 

Keith

 

That's probably true... this, though:

 

dealers.Dealer_BusinessName LIKE '%" . $BusinessName . "%'

 

is going to do a full table scan I think.  If you can change it to this:

 

dealers.Dealer_BusinessName LIKE '" . $BusinessName . "%'

 

and have dealers.Dealer_Business indexed, it should speed up the query.

 

A lot of what to do though depends on the output of the EXPLAIN query.

Hi

 

Good spot, hadn't really noticed the like.

 

One other point is that the OP appears to be checking fields against passed values or if the passed values are blank. Might be better to dynamically build up the query and drop the check of category.Cat_ID if $Category is blank.

 

All the best

 

Keith

  • 2 weeks later...

I should note that I used values for the variables so this is the query below

 

EXPLAIN extended SELECT Count(Distinct Dealer_ID) as Count FROM ((dealer_category_matrix RIGHT JOIN dealers ON dealers.Dealer_ID = dealer_category_matrix.DCM_Dealer_ID) LEFT JOIN category ON dealer_category_matrix.DCM_Cat_ID = category.Cat_ID) INNER JOIN zipcode ON zipcode.ZIPCode = dealers.Dealer_Zip WHERE (category.Cat_ID='" 34 "' OR ''='" 34 "') AND (dealers.Dealer_Publish='Y') AND (dealers.Dealer_Approved='Y') AND (CURDATE() < dealers.Dealer_Expires) AND (( 3963 * acos( sin( 35.2978499 / 57.2958 ) * sin( zipcode.latitude / 57.2958 ) + cos( 35.2978499 / 57.2958 ) * cos( zipcode.latitude / 57.2958 ) * cos( zipcode.longitude / 57.2958 - -119.1377090 / 57.2958 ) ) ) < 250) AND (dealers.Dealer_BusinessName LIKE '%" . $BusinessName . "%') AND (dealers.Dealer_State = '" . $State . "' OR '0'='" . $State . "0');

 

Thanks again for your help.

Hi

 

I would be tempted to take some of the calculation out of the SQL. For example the bits sin( '$rrr' / 57.2958 ) and cos( '$rrr' / 57.2958 ) will be constant for ever rows calculation so no need to recalculate them.

 

Further you are checking fields or checking that the passed parameter is 0, and it would be better to just not do the compare if the parameter is 0.

 

Ie, something like this

 

$QuerySin = sin( $rrr / 57.2958 );
$QueryCos = cos( $rrr / 57.2958 );
$sql = "SELECT Count(Distinct Dealer_ID) as Count 
FROM  ((dealer_category_matrix 
RIGHT JOIN dealers ON dealers.Dealer_ID =
dealer_category_matrix.DCM_Dealer_ID) 
LEFT JOIN category ON dealer_category_matrix.DCM_Cat_ID = category.Cat_ID) 
INNER JOIN zipcode ON zipcode.ZIPCode = dealers.Dealer_Zip 
WHERE  ".(($Category == '') ? '' : "(category.Cat_ID='" . $Category . "' OR ''='" . $Category . "') AND ").
"(dealers.Dealer_Publish='Y') AND (dealers.Dealer_Approved='Y') 
AND (CURDATE() < dealers.Dealer_Expires) AND (( 3963 * acos( $QuerySin * sin( zipcode.latitude / 57.2958 ) + $QueryCos * cos( zipcode.latitude / 57.2958 ) * cos( zipcode.longitude / 57.2958 - '$sss' / 57.2958 ) ) ) < " . $Radius . ") 
AND (dealers.Dealer_BusinessName LIKE '%" . $BusinessName . "%') 
".(($State == '0') ? '' : " AND (dealers.Dealer_State = '" . $State . "' OR '0'='" . $State . "0')").";";

$Dealer_Results->DataSource->CountSQL = $sql;

 

All the best

 

Keith

There is an error in the code you  suggested, that is very well appreciated by the way,  but  after I put it in my PHP code it has an error. I think the error is here AND ")."(dealersand i think there is a AND missing between "%') ".(($State == '0') can you please confirm or if you see another possibility please let me know.

 

Thanks Don

Hi

 

I can't see the error, and also I can't match up what you have suggested as the error with the code I posted.

 

I have seen I have left some unrequired code in place, and also I have edited the code to remove the mix of left and right joins

 

<?php
<?php

$QuerySin = sin( $rrr / 57.2958 );
$QueryCos = cos( $rrr / 57.2958 );
$sql = "SELECT Count(Distinct Dealer_ID) as Count 
FROM  dealers 
INNER JOIN zipcode ON dealers.Dealer_Zip = zipcode.ZIPCode 
LEFT JOIN dealer_category_matrix ON dealers.Dealer_ID = dealer_category_matrix.DCM_Dealer_ID 
LEFT JOIN category ON dealer_category_matrix.DCM_Cat_ID = category.Cat_ID 
WHERE  ".(($Category == '') ? '' : "category.Cat_ID='" . $Category . "' AND ").
"dealers.Dealer_Publish='Y' AND dealers.Dealer_Approved='Y' 
AND CURDATE() < dealers.Dealer_Expires AND ( 3963 * acos( $QuerySin * sin( zipcode.latitude / 57.2958 ) + $QueryCos * cos( zipcode.latitude / 57.2958 ) * cos( zipcode.longitude / 57.2958 - '$sss' / 57.2958 ) ) ) < " . $Radius . " 
AND dealers.Dealer_BusinessName LIKE '%" . $BusinessName . "%' 
".(($State == '0') ? '' : " AND dealers.Dealer_State = '" . $State . "'").";";

$Dealer_Results->DataSource->CountSQL = $sql;

?>

 

Basically a couple of the bits of the query are only added if required (ie, no point in checking Dealer_State when $State is 0 and 0 just means any state).

 

The code .(($State == '0') ? '' : " AND dealers.Dealer_State = '" . $State . "'"). is essentially an in line if, and if $State is 0 then it concatenates '' into the string (ie, nothing), but otherwise if (say) State was 5 it would concatenate in " AND dealers.Dealer_State = '5'"

 

All the best

 

Keith

Well your code works well so I am very thankful.  After further review I noticed that I am running to query's which one uses the results from the first ( the one that you helped me with ).  The code is below and I think when my associate was helping me with the code he made it more complicated and I think this is why it is hanging up.  It appears that it is using the results from the first query and comparing it again to every record.  Is that the case is it overkill and is it the main problem in this hangup.  If so any recomendations?  Thanks again for your help.

 

$Dealer_Results->DataSource->SQL =

"SELECT distinct dealers. * , ( 3963 * acos( sin( '$rrr'/ 57.2958 ) * sin( zipcode.latitude / 57.2958 ) + cos( '$rrr' / 57.2958 ) * cos( zipcode.latitude / 57.2958 ) * cos( zipcode.longitude / 57.2958 - '$sss' / 57.2958 )) )

AS Dealer_Distance

FROM ((dealer_category_matrix RIGHT JOIN dealers

ON dealers.Dealer_ID = dealer_category_matrix.DCM_Dealer_ID)

LEFT JOIN category

ON dealer_category_matrix.DCM_Cat_ID = category.Cat_ID)

INNER JOIN zipcode

ON zipcode.ZIPCode = dealers.Dealer_Zip, zipcode zipcodesearch

WHERE (dealers.Dealer_Zip = zipcode.ZIPCode)

AND (dealers.Dealer_Publish='Y')

AND (dealers.Dealer_Approved='Y')

AND (CURDATE() < dealers.Dealer_Expires)

AND (( 3963 * acos( sin( '$rrr' / 57.2958 ) * sin( zipcode.latitude / 57.2958 ) + cos( '$rrr' / 57.2958 ) * cos( zipcode.latitude / 57.2958 ) * cos( zipcode.longitude / 57.2958 - '$sss' / 57.2958 ) ) ) <" . $Radius . ")

AND (dealers.Dealer_BusinessName LIKE '%" . $BusinessName . "%')

AND (dealers.Dealer_State = '" . $State . "' OR '0'='" . $State . "0')

AND (category.Cat_ID='" . $Category . "' OR ''='" . $Category . "')";

Hi

 

The first query seems to get you a count of all the appropriate dealers within a certain area.

 

The 2nd is getting you all the dealers info.

 

I would guess that the first is for a menu page, while the 2nd is for a separate page. If so not sure you can merge them (although some of the changes in the first can also be done to the 2nd).

 

If they are all on the same page then I suspect you can merge them fairly easily and just use mysql_num_rows to get the count.

 

All the best

 

Keith

Thanks, in my UNexpert opinion I should be able to use the second query to perform both of the queries and just use the mysql_num_rows() to get the count- in lieu of the first query.  I should be able assign the mysql_num_rows result to

 

$Dealer_Results->DataSource->CountSQL

 

Now since this or both queries are based on distance < $Radius, then why is the Distinct dealers portion doing the math in the beginning and then doing the math again as a comparison at the end. 

 

Is it possible to modify the Select portion of the first query with

"SELECT distinct dealers. * , ( 3963 * acos( sin( '$rrr'/ 57.2958 ) * sin( zipcode.latitude / 57.2958 ) + cos( '$rrr' / 57.2958 ) * cos( zipcode.latitude / 57.2958 ) * cos( zipcode.longitude / 57.2958 - '$sss' / 57.2958 )) )

AS Dealer_Distance

because they are returning the same result?

 

I apologize I have been working on this project for over 2 years, and the least of my concerns at the time were the sql portion and now I have become discouraged with this whole project because it seems that checking the validity on each record in a large database is impossible.  The original sql code was for dealer distance based on zip codes only and I modified the sql code to use full addresses, which was working fine, with 500 records of course.

Any ideas would be appreciated.

 

 

 

Forgot to tell you these two original codes were not searching for the category match.  as an example "search all records that category  = 3".  And they work extremely fast 1 second or so for the results so i think i screwed them up adding the category query portion which are the queries in question.  I was using a conditional statement to check if Category is >0 to run the queries that are lagging.  Is there a way to incorporate the non working queries to check the Category if is > 0 and only display those that match in the query below or do I have to keep the conditional for the Category.  The original codes below did not search the category.  Maybe I didnt add them to the non working queries correctly.

 

These work super fast but query not checking Category match.

 

$Dealer_Results->DataSource->CountSQL =

"SELECT Count(*) as Count FROM dealers, zipcode, zipcode AS zipcodesearch

WHERE (zipcodesearch.ZIPCode = 00611)

AND  (dealers.Dealer_Zip = zipcode.ZIPCode)

AND  (dealers.Dealer_Publish='Y')

AND (dealers.Dealer_Approved='Y')

AND (CURDATE() < dealers.Dealer_Expires) 

AND (( 3963 * acos( sin( '$rrr' / 57.2958 ) * sin(zipcode.latitude / 57.2958 ) + cos( '$rrr'/ 57.2958 ) * cos( zipcode.latitude / 57.2958 ) * cos( zipcode.longitude / 57.2958 - '$sss'/ 57.2958 ) ) ) <" . $Radius . ")

AND (dealers.Dealer_BusinessName LIKE '%" . $BusinessName . "%')

AND (dealers.Dealer_State = '" . $State . "' OR '0'='" . $State . "0');";

 

 

$Dealer_Results->DataSource->SQL = "SELECT distinct dealers. * , ( 3963 * acos( sin( '$rrr'/ 57.2958 ) * sin( zipcode.latitude / 57.2958 ) + cos( '$rrr' / 57.2958 ) * cos( zipcode.latitude / 57.2958 ) * cos( zipcode.longitude / 57.2958 - '$sss' / 57.2958 )) ) AS Dealer_Distance FROM dealers, zipcode as zipcode

WHERE (dealers.Dealer_Zip = zipcode.ZIPCode)

AND (dealers.Dealer_Publish='Y')

AND (dealers.Dealer_Approved='Y')

AND (CURDATE() < dealers.Dealer_Expires)

AND (( 3963 * acos( sin( '$rrr' / 57.2958 ) * sin( zipcode.latitude / 57.2958 ) + cos( '$rrr' / 57.2958 ) * cos( zipcode.latitude / 57.2958 ) * cos( zipcode.longitude / 57.2958 - '$sss' / 57.2958 ) ) ) <" . $Radius . ")

AND (dealers.Dealer_BusinessName LIKE '%" . $BusinessName . "%') AND (dealers.Dealer_State = '" . $State . "' OR '0'='" . $State . "0')";

 

 

Thanks, in my UNexpert opinion I should be able to use the second query to perform both of the queries and just use the mysql_num_rows() to get the count- in lieu of the first query.  I should be able assign the mysql_num_rows result to

 

$Dealer_Results->DataSource->CountSQL

 

Yes, I would think you can do that

 

Now since this or both queries are based on distance < $Radius, then why is the Distinct dealers portion doing the math in the beginning and then doing the math again as a comparison at the end. 

 

Unless you use a subselect you will need to calculate the distance both in the WHERE clause and in the SELECT clause. You could try using a HAVING clause though.

 

Forgot to tell you these two original codes were not searching for the category match.  as an example "search all records that category  = 3".  And they work extremely fast 1 second or so for the results so i think i screwed them up adding the category query portion which are the queries in question.  I was using a conditional statement to check if Category is >0 to run the queries that are lagging.  Is there a way to incorporate the non working queries to check the Category if is > 0 and only display those that match in the query below or do I have to keep the conditional for the Category.  The original codes below did not search the category.  Maybe I didnt add them to the non working queries correctly.

 

Adding a check on category should be easy enough, and not sure why it should slow things down drastically (although worth playing with the indexes). However not sure I can help more without knowing more of the system.

 

However the clause AND (dealers.Dealer_State = '" . $State . "' OR '0'='" . $State . "0') appears to check that there is a match on state or that no state has been specified. If no state has been specified then there is no point in having that clause as it will just be slowing things down.

 

All the best

 

Keith

Thanks for your reply.  I looked up the HAVING statement but it looks like I have to use GROUP BY.  So it sounds like if I have a table

 

called "dealer_category_matrix"

 

DCM_ID holds auto increment for record

DCM_Dealer_ID holds Dealer_ID that corresponds to dealers table Dealer_ID

DCM_Cat_ID holds Category which each Dealer can have more than 1

 

and another table Category

Cat_ID id for table

Cat_Text holds name ; which name is the chosen field by user

 

then theoriticall then is this the right joining method or will it matter?

LEFT JOIN dealer_category_matrix ON dealers.Dealer_ID = dealer_category_matrix.DCM_Dealer_ID LEFT JOIN category ON dealer_category_matrix.DCM_Cat_ID = category.Cat_ID

 

Hi

 

HAVING can be used without a GROUP BY clause, just that its normal use is for checking against an aggregate column (such as would normally require a GROUP BY clause). In you case I am not sure on the effects on efficiency as I would expect mysql to optimise the query anyway so it only did the distance calculation once even though it is in the query twice.

 

You seem to have the right idea with the joining table, but you will have issues if there are multiple catagories. You will get the row returned multiple times if there are multiple matches on category.

 

All the best

 

Keith

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.