Jump to content

Recommended Posts

Hi everyone, Ive been tasked with building a spreadsheet type application that allows users to modify columns and change filters on the fly. Its a pretty intense database with lots of different tables so in order to access any of the datas columns, there has to be joins.

 

It worked great when there were maybe 20 or so rows per table, but now theres 10k-30k in most of them, and needless to say its running much slower. However I figured I could index any column that the user can filter by, to speed things up. This does not seem to have helped....

 

(note, I know there are un-needed joins in the query right now. I am testing for a worst case scenerio where the user is requesting data from all available tables)

 

SELECT p.`sku` AS `id` , p.`luna_sku` , hp.`edges` , f.`fiber` , p.`style_name`

FROM `lunapr_c1price` AS `p`

LEFT JOIN `lunapr_c1hard` AS `hp` ON p.`luna_sku` = hp.`luna_sku`

LEFT JOIN `lunapr_c1soft` AS `sp` ON p.`luna_sku` = sp.`luna_sku`

LEFT JOIN `lunapr_c1vendor` AS `v` ON ( p.`manufacturer` = v.`manufacturer`

AND p.`division_dist` = v.`division_dist` )

LEFT JOIN `lunapr_c1geoprice` AS `pr` ON ( p.`luna_sku` = pr.`luna_sku`

AND pr.`current` = '1' )

LEFT JOIN `lunapr_c1geocost` AS `c` ON ( p.`luna_sku` = c.`luna_sku`

AND c.`current` = '1' )

LEFT JOIN `lunapr_c1fiber` AS `f` ON sp.`fiber` = f.`id`

WHERE (

p.`style_name` != 'Alaska Sg Maple2.5 '

)

ORDER BY f.`fiber` DESC

LIMIT 0 , 10

 

This query takes like 24 seconds to complete...thats no good.

 

SELECT p.`sku` AS `id` , p.`luna_sku` , hp.`edges` , f.`fiber` , p.`style_name`

FROM `lunapr_c1price` AS `p`

LEFT JOIN `lunapr_c1hard` AS `hp` ON p.`luna_sku` = hp.`luna_sku`

LEFT JOIN `lunapr_c1soft` AS `sp` ON p.`luna_sku` = sp.`luna_sku`

LEFT JOIN `lunapr_c1vendor` AS `v` ON ( p.`manufacturer` = v.`manufacturer`

AND p.`division_dist` = v.`division_dist` )

LEFT JOIN `lunapr_c1geoprice` AS `pr` ON ( p.`luna_sku` = pr.`luna_sku`

AND pr.`current` = '1' )

LEFT JOIN `lunapr_c1geocost` AS `c` ON ( p.`luna_sku` = c.`luna_sku`

AND c.`current` = '1' )

LEFT JOIN `lunapr_c1fiber` AS `f` ON sp.`fiber` = f.`id`

WHERE (

p.`style_name` != ' '

)

ORDER BY f.`fiber` DESC

LIMIT 0 , 10

 

This query takes like 1.4 seconds. Still not great but bareable. Any idea why they would be so drastically different?

 

SELECT p.`sku` AS `id` , p.`luna_sku` , hp.`edges` , f.`fiber` , p.`style_name`

FROM `lunapr_c1price` AS `p`

LEFT JOIN `lunapr_c1hard` AS `hp` ON p.`luna_sku` = hp.`luna_sku`

LEFT JOIN `lunapr_c1soft` AS `sp` ON p.`luna_sku` = sp.`luna_sku`

LEFT JOIN `lunapr_c1fiber` AS `f` ON sp.`fiber` = f.`id`

WHERE (

p.`style_name` != 'Alaska Sg Maple2.5 '

)

ORDER BY f.`fiber` DESC

LIMIT 0 , 10

 

This one only takes .8 seconds. So obviously removing the other joins helps.

 

(while writing this post I unset and reset some indexes and made things, worse, the big query now takes 140+seconds...)

 

This is the explain on the nightmare query

 

id select_type table type possible_keys key key_len ref rows Extra

1 SIMPLE p range style_name style_name 27 NULL 13607 Using where; Using temporary; Using filesort

1 SIMPLE hp ref luna_sku luna_sku 9 luna_filepro.p.luna_sku 1

1 SIMPLE sp ref luna_sku luna_sku 9 luna_filepro.p.luna_sku 1 Using index

1 SIMPLE v ref division_dist,manufacturer division_dist 27 luna_filepro.p.division_dist 2

1 SIMPLE pr ref luna_sku luna_sku 9 luna_filepro.p.luna_sku 8

1 SIMPLE c ref luna_sku luna_sku 9 luna_filepro.p.luna_sku 2

1 SIMPLE f eq_ref PRIMARY PRIMARY 4 luna_filepro.sp.fiber 1

 

 

I assume it is because its only using division_dist as a ref for `v` instead of the combination of division_dist and manufacturer. Any ideas?

 

If  remove that one join (the vendor join) it is decent, once I include that its a nightmare.

Link to comment
https://forums.phpfreaks.com/topic/194529-mysql-query-optimization/
Share on other sites

The only way you limit down the size of result set is to have where clauses that use indexes.  If you want to join a bunch of tables that have thousands of rows and produce result sets of thousands of rows, things are going to be slow.

 

In your explain plan, you can see that:

 

1    SIMPLE    p    range    style_name    style_name    27    NULL    13607    Using where; Using temporary; Using filesort

 

Which is making a temporary table with 13607 rows, which it then sorts.  Does the style_name column have an index on it?  Even with an index it may not use it if this is a low cardinality column (only a few style_names).

I put an index on style_name, and it is a pretty high cardinality (6804, so about a 1:2)

 

However it still does not seem to be using it.

 

Putting an index that is both manufactuer and dist worked. In fact I went back and made that combination the primary key. I also attemepted to make a foreign key to those fields, but I am not sure if that helped.

 

The query time is drastically lower with those two fields. I still can not figure out why style_name is not using its index though.

I get an error..."Can't group on 'countof'"

 

In case it helps, there are 7,935 distinct values of "style_name"

 

and there are a total of 13,608 distinct rows in the table.

 

if I run

 

select style_name, count(*) as countof FROM lunapr_c1price GROUP BY style_name

 

I get 7,935 results, most of which have a countof value of 1, some have more then that. One had 10...but a vast majority had 1.

Yeah I air balled on the group by, but you figured it out.  Ok, so what I would say is that there are so many values that are NOT EQUAL to the one you want, that mysql has decided to dump out the entire table.  So at any rate we're back to the simple fact that you have a base result that is returning a lot of rows (select count(*) from lunapr_c1price where sytle_name != 'Alaska Sg Maple2.5'.  This is being used to join to all the other tables, so there's a lot of rows being examined, unless I'm missing something.  When you do an explain on just that simple select statement, what do you get?

So --- as you can see, you have a result set of 13607 rows being joined N times.  This isn't going to be fast.  You may be able to get it to stop using the temporary table and filesort by hinting the use of the appropriate index and see if that significantly improves the performance:  http://dev.mysql.com/doc/refman/4.1/en/index-hints.html

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.