Jump to content

Archived

This topic is now archived and is closed to further replies.

pootlecat

Searching and Indexes

Recommended Posts

Hi,

I have a fair sized table now (1,955,041 rows) and it currently has two indexes:
PRIMARY is the ID number
and
Keywords is a FULLTEXT index of the Keywords column (Text).

In my first query type I also check two other fields: Disabled and Toplevel

[CODE]SELECT * FROM files WHERE Disabled='0' AND TopLevel='clipart' AND MATCH (Keywords) AGAINST ('+tree' IN BOOLEAN MODE) ORDER by ID;[/CODE]

In my first query type I do not have keywords but am rather checking to see if an image belongs to a particular category like so:

[CODE]SELECT * FROM files WHERE Disabled='0' AND TopLevel='clipart' AND (Category='Food' OR Category2='Food' OR Category3='Food' OR Category4='Food') ORDER by ID[/CODE]

I am very much a beginner with MySQL. My question is, are there any other indexes (or anything else) that I can do that will speed up the results, some of which are appallingly slow!?

+----------------------+------------------+------+-----+---------+----------------+
| Field                | Type            | Null | Key | Default | Extra          |
+----------------------+------------------+------+-----+---------+----------------+
| ID                  | int(20) unsigned |      | PRI | NULL    | auto_increment |
| Disabled            | char(2)          |      |    | 0      |                |
| TopLevel            | varchar(50)      |      |    |        |                |
| Keywords            | text            |      |    |        |                |
| Category            | varchar(100)    |      |    |        |                |
| Category2            | varchar(20)      |      |    |        |                |
| Category3            | varchar(20)      |      |    |        |                |
| Category4            | varchar(20)      |      |    |        |                |

Share this post


Link to post
Share on other sites
Can you post the corresponding EXPLAIN output?

Share this post


Link to post
Share on other sites
Thanks for your reply :)
I am just splitting the main table into 3 separate ones so that there will not be quite so much data in them. Once I have done that I will update my post.

Share this post


Link to post
Share on other sites
I changed things around a little. Here is the one query and its explain statement:

Current Indexes
===============
Keyname Type Cardinality Action Field
PRIMARY  PRIMARY 1788750      ID 
disab_cat  INDEX 11      Disabled  Category 10 
disab_cats  INDEX 15      Disabled  Category 10  Category2 10  Category3 10  Category4 10 
Keywords  FULLTEXT 1      Keywords 

Explain Results
===============
EXPLAIN SELECT * FROM files_webgraphics WHERE Disabled='0' AND Category='Arrows' ORDER by ID

id  select_type  table            type  possible_keys        key      key_len  ref        rows  Extra 
1  SIMPLE      files_webgraphics ref  disab_cat,disab_cats disab_cat 14      const,const 121438 Using where; Using filesort

Other Info
==========
Disabled is either 1 (disabled) or 0 (not disabled).
Category is one of the following:
  Animations, Buttons, Rules
  Arrows, Bullets, Textures
  Backgrounds, Objects, Words

Share this post


Link to post
Share on other sites
Well, if you drop the index on disabled (poor cardinality), you can have a covering index on (category, ID), which will save the filesort.

Share this post


Link to post
Share on other sites
Actually, looking at it again, there are a few things that you can do.  First, you can use a self-JOIN to have one index used for FULLTEXT and the other using a covering index for both other fields.  Second, you can simply UNION a bunch of results, instead of using ORs, so that you get proper index utilization.

Share this post


Link to post
Share on other sites
That sounds more interesting :)
It is rather over my heard though - could you possibly give me an example?
Thanks!

Share this post


Link to post
Share on other sites
Those were examples of examples... I assume you mean sample code? See how far you get, and post what you've come up with.  I don't have the time right now to re-write the queries from scratch, but it shouldn't take you that long.

Share this post


Link to post
Share on other sites
Is this what you mean?

(SELECT * FROM files WHERE Disabled='0' AND Category='Food')
UNION
(SELECT * FROM files WHERE Disabled='0' AND Category2='Food')
UNION
(SELECT * FROM files WHERE Disabled='0' AND Category3='Food')
UNION
(SELECT * FROM files WHERE Disabled='0' AND Category4='Food')
ORDER by ID

and I should have 4 corresponding indexes?

Share this post


Link to post
Share on other sites
Correct... you'd need an index on each of the category fields; don't bother indexing on disabled, since it's only ever 0 or 1, so the index would be useless and a waste of space/time.

Share this post


Link to post
Share on other sites
Thanks - that seems to work really well.

Now I am just left with one more problem. The second query just got a lot more complicated and I have tried going the UNION route with it too but I am not sure that is the best way. This is what I have now:

[code](SELECT * FROM files WHERE Disabled='0' AND Colour='1' AND MATCH (Keywords)
AGAINST ('cat' IN BOOLEAN MODE) AND Category='Animals')
UNION
(SELECT * FROM files WHERE Disabled='0' AND Colour='1' AND MATCH (Keywords)
AGAINST ('cat' IN BOOLEAN MODE) AND Category2='Animals')
UNION
(SELECT * FROM files WHERE Disabled='0' AND Colour='1' AND MATCH (Keywords)
AGAINST ('cat' IN BOOLEAN MODE) AND Category3='Animals')
UNION
(SELECT * FROM files WHERE Disabled='0' AND Colour='1' AND MATCH (Keywords)
AGAINST ('cat' IN BOOLEAN MODE) AND Category4='Animals') ORDER by rand(2)[/code]

Colour can only be 1 or 0 so I assume I should just ignore this too?

Share this post


Link to post
Share on other sites
In general, you can "pull out" parts of the WHERE clause that won't be retrieved from an index "outside" the individual UNIONs (i.e. for the previous query, pulling disabled=0 "outside"); however, in the case, you'd be left with the FULLTEXT and categoryN indexes, which can't be combined, so you have to get more fancy (UNTESTED, but you should get the idea):

[code]
SELECT * FROM files
INNER JOIN
(
(SELECT * FROM files WHERE Category='Animals')
UNION
(SELECT * FROM files WHERE Category2='Animals')
UNION
(SELECT * FROM files WHERE Category3='Animals')
UNION
(SELECT * FROM files WHERE Category4='Animals')
) AS filesByCategory
ON ( files.id = filesByCategory.id )
WHERE MATCH (files.Keywords) AGAINST ('cat' IN BOOLEAN MODE)
AND files.Disabled='0' AND files.Colour='1'
[/code]

Share this post


Link to post
Share on other sites
Thanks again for all your help! :)
Everything seems to work ok at the moment. I am just going to do a little testing with EXPLAIN and timing the results to see how it is all working.

One other quick question - I've taken to sorting by "rand(2)" to mix things up a little and I was wondering if this is known to cause a performance hit with larger tables/results? Is there a better way to "randomise" results?

Share this post


Link to post
Share on other sites
Yes, ORDER BY RAND() can be quite slow -- because it needs to calculate a random number for every record, and then sort them -- but it's not very straightforward to do it differently.  I suggest you check out [url=http://jan.kneschke.de/projects/mysql/order-by-rand/]this link[/url].

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.