Jump to content


Photo

Searching and Indexes


  • Please log in to reply
14 replies to this topic

#1 pootlecat

pootlecat
  • Members
  • PipPip
  • Member
  • 13 posts

Posted 08 October 2006 - 04:13 AM

Hi,

I have a fair sized table now (1,955,041 rows) and it currently has two indexes:
PRIMARY is the ID number
and
Keywords is a FULLTEXT index of the Keywords column (Text).

In my first query type I also check two other fields: Disabled and Toplevel

SELECT * FROM files WHERE Disabled='0' AND TopLevel='clipart' AND MATCH (Keywords) AGAINST ('+tree' IN BOOLEAN MODE) ORDER by ID;

In my first query type I do not have keywords but am rather checking to see if an image belongs to a particular category like so:

SELECT * FROM files WHERE Disabled='0' AND TopLevel='clipart' AND (Category='Food' OR Category2='Food' OR Category3='Food' OR Category4='Food') ORDER by ID

I am very much a beginner with MySQL. My question is, are there any other indexes (or anything else) that I can do that will speed up the results, some of which are appallingly slow!?

+----------------------+------------------+------+-----+---------+----------------+
| Field                | Type            | Null | Key | Default | Extra          |
+----------------------+------------------+------+-----+---------+----------------+
| ID                  | int(20) unsigned |      | PRI | NULL    | auto_increment |
| Disabled            | char(2)          |      |    | 0      |                |
| TopLevel            | varchar(50)      |      |    |        |                |
| Keywords            | text            |      |    |        |                |
| Category            | varchar(100)    |      |    |        |                |
| Category2            | varchar(20)      |      |    |        |                |
| Category3            | varchar(20)      |      |    |        |                |
| Category4            | varchar(20)      |      |    |        |                |

#2 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 09 October 2006 - 05:48 AM

Can you post the corresponding EXPLAIN output?
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#3 pootlecat

pootlecat
  • Members
  • PipPip
  • Member
  • 13 posts

Posted 09 October 2006 - 04:28 PM

Thanks for your reply :)
I am just splitting the main table into 3 separate ones so that there will not be quite so much data in them. Once I have done that I will update my post.

#4 pootlecat

pootlecat
  • Members
  • PipPip
  • Member
  • 13 posts

Posted 13 October 2006 - 04:53 PM

I changed things around a little. Here is the one query and its explain statement:

Current Indexes
===============
Keyname Type Cardinality Action Field
PRIMARY  PRIMARY 1788750      ID 
disab_cat  INDEX 11      Disabled  Category 10 
disab_cats  INDEX 15      Disabled  Category 10  Category2 10  Category3 10  Category4 10 
Keywords  FULLTEXT 1      Keywords 

Explain Results
===============
EXPLAIN SELECT * FROM files_webgraphics WHERE Disabled='0' AND Category='Arrows' ORDER by ID

id  select_type  table            type  possible_keys        key      key_len  ref        rows  Extra 
1  SIMPLE      files_webgraphics ref  disab_cat,disab_cats disab_cat 14      const,const 121438 Using where; Using filesort

Other Info
==========
Disabled is either 1 (disabled) or 0 (not disabled).
Category is one of the following:
  Animations, Buttons, Rules
  Arrows, Bullets, Textures
  Backgrounds, Objects, Words

#5 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 13 October 2006 - 07:29 PM

Well, if you drop the index on disabled (poor cardinality), you can have a covering index on (category, ID), which will save the filesort.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#6 pootlecat

pootlecat
  • Members
  • PipPip
  • Member
  • 13 posts

Posted 19 October 2006 - 07:22 PM

Thanks! I'll try that :)

#7 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 19 October 2006 - 10:08 PM

Actually, looking at it again, there are a few things that you can do.  First, you can use a self-JOIN to have one index used for FULLTEXT and the other using a covering index for both other fields.  Second, you can simply UNION a bunch of results, instead of using ORs, so that you get proper index utilization.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#8 pootlecat

pootlecat
  • Members
  • PipPip
  • Member
  • 13 posts

Posted 22 October 2006 - 02:51 PM

That sounds more interesting :)
It is rather over my heard though - could you possibly give me an example?
Thanks!

#9 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 22 October 2006 - 08:42 PM

Those were examples of examples... I assume you mean sample code? See how far you get, and post what you've come up with.  I don't have the time right now to re-write the queries from scratch, but it shouldn't take you that long.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#10 pootlecat

pootlecat
  • Members
  • PipPip
  • Member
  • 13 posts

Posted 24 October 2006 - 03:08 PM

Is this what you mean?

(SELECT * FROM files WHERE Disabled='0' AND Category='Food')
UNION
(SELECT * FROM files WHERE Disabled='0' AND Category2='Food')
UNION
(SELECT * FROM files WHERE Disabled='0' AND Category3='Food')
UNION
(SELECT * FROM files WHERE Disabled='0' AND Category4='Food')
ORDER by ID

and I should have 4 corresponding indexes?

#11 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 24 October 2006 - 03:45 PM

Correct... you'd need an index on each of the category fields; don't bother indexing on disabled, since it's only ever 0 or 1, so the index would be useless and a waste of space/time.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#12 pootlecat

pootlecat
  • Members
  • PipPip
  • Member
  • 13 posts

Posted 24 October 2006 - 05:38 PM

Thanks - that seems to work really well.

Now I am just left with one more problem. The second query just got a lot more complicated and I have tried going the UNION route with it too but I am not sure that is the best way. This is what I have now:

(SELECT * FROM files WHERE Disabled='0' AND Colour='1' AND MATCH (Keywords) 
AGAINST ('cat' IN BOOLEAN MODE) AND Category='Animals') 
UNION 
(SELECT * FROM files WHERE Disabled='0' AND Colour='1' AND MATCH (Keywords) 
AGAINST ('cat' IN BOOLEAN MODE) AND Category2='Animals') 
UNION 
(SELECT * FROM files WHERE Disabled='0' AND Colour='1' AND MATCH (Keywords) 
AGAINST ('cat' IN BOOLEAN MODE) AND Category3='Animals') 
UNION 
(SELECT * FROM files WHERE Disabled='0' AND Colour='1' AND MATCH (Keywords) 
AGAINST ('cat' IN BOOLEAN MODE) AND Category4='Animals') ORDER by rand(2)

Colour can only be 1 or 0 so I assume I should just ignore this too?

#13 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 24 October 2006 - 08:49 PM

In general, you can "pull out" parts of the WHERE clause that won't be retrieved from an index "outside" the individual UNIONs (i.e. for the previous query, pulling disabled=0 "outside"); however, in the case, you'd be left with the FULLTEXT and categoryN indexes, which can't be combined, so you have to get more fancy (UNTESTED, but you should get the idea):

SELECT * FROM files 
INNER JOIN
(
(SELECT * FROM files WHERE Category='Animals')
UNION
(SELECT * FROM files WHERE Category2='Animals')
UNION
(SELECT * FROM files WHERE Category3='Animals')
UNION
(SELECT * FROM files WHERE Category4='Animals')
) AS filesByCategory
ON ( files.id = filesByCategory.id )
WHERE MATCH (files.Keywords) AGAINST ('cat' IN BOOLEAN MODE)
AND files.Disabled='0' AND files.Colour='1'

Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#14 pootlecat

pootlecat
  • Members
  • PipPip
  • Member
  • 13 posts

Posted 25 October 2006 - 03:29 AM

Thanks again for all your help! :)
Everything seems to work ok at the moment. I am just going to do a little testing with EXPLAIN and timing the results to see how it is all working.

One other quick question - I've taken to sorting by "rand(2)" to mix things up a little and I was wondering if this is known to cause a performance hit with larger tables/results? Is there a better way to "randomise" results?

#15 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 25 October 2006 - 06:02 PM

Yes, ORDER BY RAND() can be quite slow -- because it needs to calculate a random number for every record, and then sort them -- but it's not very straightforward to do it differently.  I suggest you check out this link.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users