Jump to content

long slow sql needs a better approach - newbie here :)


Recommended Posts

SELECT grandtotal,maincategory,oof,underp,novice FROM 

( 
SELECT COUNT(this_database.atennis2) as grandtotal, 'tennis' as maincategory, ( SELECT COUNT(this_database.graduated) FROM this_database WHERE this_database.atennis2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND graduated='0') as underp, ( SELECT COUNT(this_database.NONLIC_tennis) FROM this_database WHERE this_database.atennis2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND NONLIC_tennis='1' AND graduated='1') as oof,( SELECT COUNT(this_database.NONLIC_tennis) FROM this_database WHERE this_database.atennis2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND NONLIC_tennis='0' AND graduated='1' AND newbie='0') as 'novice' FROM this_database WHERE this_database.atennis2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy'  

UNION  

SELECT COUNT(this_database.atennis2) as grandtotal, 'soccerplayer' as maincategory, ( SELECT COUNT(this_database.graduated) FROM this_database WHERE this_database.asoccer2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND graduated='0') as underp, ( SELECT COUNT(this_database.NONLIC_soccer) FROM this_database WHERE this_database.asoccer2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND NONLIC_soccer='1' AND graduated='1') as oof,( SELECT COUNT(this_database.NONLIC_soccer) FROM this_database WHERE this_database.asoccer2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND NONLIC_soccer='0' AND graduated='1' AND newbie='0') as 'novice' FROM this_database WHERE this_database.asoccer2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy'  

UNION 

SELECT COUNT(this_database.aboater2) as grandtotal, 'boater' as maincategory, ( SELECT COUNT(this_database.graduated) FROM this_database WHERE this_database.aboater2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND graduated='0') as underp, ( SELECT COUNT(this_database.NONLIC_boater) FROM this_database WHERE this_database.aboater2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND NONLIC_boater='1' AND graduated='1') as oof,( SELECT COUNT(this_database.NONLIC_boater) FROM this_database WHERE this_database.aboater2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND NONLIC_boater='0' AND graduated='1' AND newbie='0') as 'novice' FROM this_database WHERE this_database.aboater2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy'  

UNION 

SELECT COUNT(this_database.afootball2) as grandtotal, 'football' as maincategory, ( SELECT COUNT(this_database.graduated) FROM this_database WHERE this_database.afootball2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND graduated='0') as underp, ( SELECT COUNT(this_database.NONLIC_football) FROM this_database WHERE this_database.afootball2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND NONLIC_football='1' AND graduated='1') as oof,( SELECT COUNT(this_database.NONLIC_football) FROM this_database WHERE this_database.afootball2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND NONLIC_football='0' AND graduated='1' AND newbie='0') as 'novice' FROM this_database WHERE this_database.afootball2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy'  

UNION 

SELECT COUNT(this_database.asurfer2) as grandtotal, 'surfer' as maincategory, ( SELECT COUNT(this_database.graduated) FROM this_database WHERE this_database.asurfer2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND graduated='0') as underp, ( SELECT COUNT(this_database.NONLIC_surfer) FROM this_database WHERE this_database.asurfer2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND NONLIC_surfer='1' AND graduated='1') as oof,( SELECT COUNT(this_database.NONLIC_surfer) FROM this_database WHERE this_database.asurfer2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND NONLIC_surfer='0' AND graduated='1' AND newbie='0') as 'novice' FROM this_database WHERE this_database.asurfer2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy'  

) outertable

1. I kindly ask you to read 'How to ask questions the smart way' article, a link to which you can find in my signature.

2. Avoid subqueries in column list (part of query before WHERE). Learn how to use JOINs.

3. Your 'this_database' table seems to be poorly designed... care to show us it's structure?

1. I kindly ask you to read 'How to ask questions the smart way' article, a link to which you can find in my signature.

2. Avoid subqueries in column list (part of query before WHERE). Learn how to use JOINs.

3. Your 'this_database' table seems to be poorly designed... care to show us it's structure?

 

1. wow that is a long article!

2. I feel like I have a decent grasp of joins, but could not figure out how to do it

3. yes it is...it is the beast I was given, and not really open to changing

 

I will try to describe what is happening:

 

this is a database of instructors

 

there is a field atennis2,asoccer2,afootball2 etc which contain a 1 or 0 which indicates whether or not they teach that subject

 

there is a field NONLIC_tennis,NONLIC_soccer etc that is an 0 or 1. 1 means that they do not have the credentials to teach the class.

 

there is a field 'graduated' which tells simply if they have any teaching credential

 

so SELECT COUNT(this_database.atennis2) as grandtotal, [all people who teach tennis]

 

'tennis' as maincategory, [defining maincategory as 'tennis']

 

( SELECT COUNT(this_database.graduated) FROM this_database WHERE this_database.atennis2=1 AND aca_type='Sport Academy' AND graduated='0'  ) as underp, [select number of people who teach tennis,but have not finished their credentials]

 

( SELECT COUNT(this_database.NONLIC_tennis) FROM this_database WHERE this_database.atennis2=1  AND aca_type='Sport Academy' AND NONLIC_tennis='1' AND graduated='1' ) as oof, [select number of people who teach tennis and have a credential,but not in tennis]

 

( SELECT COUNT(this_database.NONLIC_tennis) FROM this_database WHERE this_database.atennis2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' AND NONLIC_tennis='0' AND graduated='1' AND newbie='0' ) as 'novice' [select number of people who teach tennis and have a tennis credential,but have only been teaching for a little while]

FROM this_database WHERE this_database.atennis2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' [this is simply the selector for grandtotal]

 

 

Here's another long, but certainly informative article for you:

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

 

A database is not a spreadsheet. If you try to use it as such, it will hate you and you will hate it.

Here's another long, but certainly informative article for you:

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

 

A database is not a spreadsheet. If you try to use it as such, it will hate you and you will hate it.

 

I totally get it, my database is very poorly formed, but as I said, it is what it is.

 

I was hoping someone would have a creative solution around the problem.

Creative solution? Why get creative, when there's perfectly valid standard solution?

 

I read the whole normalization article you linked to, but found nothing that would help.

 

as I said, this is not my area of expertise.

 

Would you be so kind as to help me out a bit?

I will not recommend anything else here, than doing data normalisation here.

 

For example you say:

there is a field atennis2,asoccer2,afootball2 etc which contain a 1 or 0 which indicates whether or not they teach that subject

 

there is a field NONLIC_tennis,NONLIC_soccer etc that is an 0 or 1. 1 means that they do not have the credentials to teach the class.

 

How about creating one table for 'instructors' one table for 'sports' and one table that keeps information on which instructor can teach which sports?

While the best solution to this problem would be to normalize the database; since that cannot be done, we can try to improve the query.  You have multiple levels of sub-queries.  The inner-most level (3 sub-queries) are all scanning the same table with the same BASE criteria.  We can pull this out to the middle level and remove three table scans for each of the UNION sections.  The first set would look like this:

 

SELECT grandtotal,maincategory,oof,underp,novice FROM
(
SELECT COUNT(this_database.atennis2) as grandtotal, 'tennis' as maincategory, 
	SUM(IF(graduated='0',1,0)) as underp,
	SUM(IF(NONLIC_tennis='1' AND graduated='1',1,0)) as oof,
	SUM(IF(NONLIC_tennis='0' AND graduated='1' AND newbie='0',1,0)) as 'novice'
FROM this_database 
WHERE this_database.atennis2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' 

UNION 

 

Using SUM() and IF() we can get a count for each of those conditions (it is a count because we are SUMming the value of 1 (one)).  And do it in a single sub-query.  Each of the UNION sections can be revised to a similar arrangement by substituting the appropriate columns (atennis, asoccer, afootball, etc). 

 

Give this a shot and see if the performance improves significantly.  I am looking at the UNIONS to see if there is a similar solution so we can eliminate them, but the query itself may be more complex, and we may have to change the structure of the returned data, which may or may not be a problem for you.

 

Thank you so much! I wasn't aware of using conditions with the SQL functions like, though that is exactly what I assumed should be possible. I will rewrite these later, but I know this is exactly what I needed.

 

While the best solution to this problem would be to normalize the database; since that cannot be done, we can try to improve the query.  You have multiple levels of sub-queries.  The inner-most level (3 sub-queries) are all scanning the same table with the same BASE criteria.  We can pull this out to the middle level and remove three table scans for each of the UNION sections.  The first set would look like this:

 

SELECT grandtotal,maincategory,oof,underp,novice FROM
(
SELECT COUNT(this_database.atennis2) as grandtotal, 'tennis' as maincategory, 
	SUM(IF(graduated='0',1,0)) as underp,
	SUM(IF(NONLIC_tennis='1' AND graduated='1',1,0)) as oof,
	SUM(IF(NONLIC_tennis='0' AND graduated='1' AND newbie='0',1,0)) as 'novice'
FROM this_database 
WHERE this_database.atennis2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' 

UNION 

 

Using SUM() and IF() we can get a count for each of those conditions (it is a count because we are SUMming the value of 1 (one)).  And do it in a single sub-query.  Each of the UNION sections can be revised to a similar arrangement by substituting the appropriate columns (atennis, asoccer, afootball, etc). 

 

Give this a shot and see if the performance improves significantly.  I am looking at the UNIONS to see if there is a similar solution so we can eliminate them, but the query itself may be more complex, and we may have to change the structure of the returned data, which may or may not be a problem for you.

I inherited 20 databases...each just as flawed. the smallest ones have 90,000+ records the largest have 350,000+

 

the fictional example I gave is just a fraction of the garbage I have to sift through. Their budget is small, so I have to work with what I have. Thanks for your help though.

 

I will not recommend anything else here, than doing data normalisation here.

 

For example you say:

there is a field atennis2,asoccer2,afootball2 etc which contain a 1 or 0 which indicates whether or not they teach that subject

 

there is a field NONLIC_tennis,NONLIC_soccer etc that is an 0 or 1. 1 means that they do not have the credentials to teach the class.

 

How about creating one table for 'instructors' one table for 'sports' and one table that keeps information on which instructor can teach which sports?

I inherited 20 databases...each just as flawed. the smallest ones have 90,000+ records the largest have 350,000+

 

the fictional example I gave is just a fraction of the garbage I have to sift through. Their budget is small, so I have to work with what I have. Thanks for your help though.

 

OR you could (a) ask for more money or (b) not do the work for no money.

I did as you suggested, and performance was greatly improved (obviously). My worst went from 12.46 seconds to 3.6 seconds.

I also removed

 

SELECT grandtotal,maincategory,oof,underp,novice FROM
(

 

Is there any reason for keeping that in?

 

While the best solution to this problem would be to normalize the database; since that cannot be done, we can try to improve the query.  You have multiple levels of sub-queries.  The inner-most level (3 sub-queries) are all scanning the same table with the same BASE criteria.  We can pull this out to the middle level and remove three table scans for each of the UNION sections.  The first set would look like this:

 

SELECT grandtotal,maincategory,oof,underp,novice FROM
(
SELECT COUNT(this_database.atennis2) as grandtotal, 'tennis' as maincategory, 
	SUM(IF(graduated='0',1,0)) as underp,
	SUM(IF(NONLIC_tennis='1' AND graduated='1',1,0)) as oof,
	SUM(IF(NONLIC_tennis='0' AND graduated='1' AND newbie='0',1,0)) as 'novice'
FROM this_database 
WHERE this_database.atennis2=1 AND (HAIR='R' OR HAIR='B' OR HAIR='CC' ) AND aca_type='Sport Academy' 

UNION 

 

Using SUM() and IF() we can get a count for each of those conditions (it is a count because we are SUMming the value of 1 (one)).  And do it in a single sub-query.  Each of the UNION sections can be revised to a similar arrangement by substituting the appropriate columns (atennis, asoccer, afootball, etc). 

 

Give this a shot and see if the performance improves significantly.  I am looking at the UNIONS to see if there is a similar solution so we can eliminate them, but the query itself may be more complex, and we may have to change the structure of the returned data, which may or may not be a problem for you.

Actually, I thought about that long after I posted my message.  Yeah, I think that outer SELECT should come out and just do it as a series of UNIONS.  I really think that is the best you will get with the database you have.

 

You might consider adding some indexes on columns that are used in the WHERE clause (not the one with the OR's but the other ones). Either a single column index where the column value vastly defines the set of records you want, or a multi-column index, like aca_type and atennis2.  Of course if you use atennis2, you'll also have to do an separate index for each of the other sports.  The index might get pretty big (depending on the amount of data, and the number of differnet values) but it could reduce the execution time.  Look at the query plan to see how it helps.  Note that creating two separate indexes say one on aca_type and another on atennis2, will probably not help.  I don't know the internals of mySql, but in my experience, a query will not generally use two separate indexes on one table, it will pick the one that helps the most and use it.  Someone who has more experience with mySql can correct me if I'm wrong (I wouldn't mind learning something new today).

 

I don't know which storage engine you are using or which ones are available to you.  And I really don't know the performance difference between them, but I suspect that there are differences in performance.  So if changing the storage engine is an option (and I'm sure it will be a lot of work to do), you might do some research on the performance of the different engines.

 

Good luck!

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.