Jump to content


Photo

mysql index for faster search


  • Please log in to reply
10 replies to this topic

#1 warnockm

warnockm
  • Members
  • PipPip
  • Member
  • 26 posts

Posted 18 May 2006 - 03:25 AM

Hi, i'm currently maintaining a web application i built that stores a lot of text data on a slow computer. The tables are heavily cross referenced, and because of this, my text searches take about a minute to perform. I'm looking at creating an index that is updated when an entry is updated, or nightly. Does anyone have any example code on doing this? I think the idea is to gather all of the words in the mysql tables, and then go through each table, count the number or words and assign the value of each page and word combination. I'd like to see examples of people doing this.

Thanks!

#2 semtex

semtex
  • Members
  • PipPip
  • Member
  • 11 posts

Posted 18 May 2006 - 10:15 AM

Hey there. Just to be sure, do you know the mysql fulltext (boolean) search possibility?
It can return a score that more or less determines the relevance of the search.
It's also quite fast, even on large databases.

Here's an excellent tutorial: [a href=\"http://www.phpfreaks.com/tutorials/129/0.php\" target=\"_blank\"]http://www.phpfreaks.com/tutorials/129/0.php[/a]

#3 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 18 May 2006 - 05:10 PM

FULLTEXT searching, IMHO, has never been properly implemented... if you're concerned about speed, then yes, you should build your own index.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#4 warnockm

warnockm
  • Members
  • PipPip
  • Member
  • 26 posts

Posted 18 May 2006 - 11:35 PM

[!--quoteo(post=374899:date=May 18 2006, 06:15 AM:name=semtex)--][div class=\'quotetop\']QUOTE(semtex @ May 18 2006, 06:15 AM) View Post[/div][div class=\'quotemain\'][!--quotec--]
Hey there. Just to be sure, do you know the mysql fulltext (boolean) search possibility?
It can return a score that more or less determines the relevance of the search.
It's also quite fast, even on large databases.

Here's an excellent tutorial: [a href=\"http://www.phpfreaks.com/tutorials/129/0.php\" target=\"_blank\"]http://www.phpfreaks.com/tutorials/129/0.php[/a]
[/quote]

I'm doing full text searches on all of the fields. I like how it gives me a score, but i don't like how slow it is. Any examples on the indexing?

#5 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 19 May 2006 - 03:43 PM

I don't know what you mean by "examples of indexing".
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#6 warnockm

warnockm
  • Members
  • PipPip
  • Member
  • 26 posts

Posted 19 May 2006 - 05:33 PM

[!--quoteo(post=375257:date=May 19 2006, 11:43 AM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ May 19 2006, 11:43 AM) View Post[/div][div class=\'quotemain\'][!--quotec--]
I don't know what you mean by "examples of indexing".
[/quote]

Yeah, i was afraid we may be talkign abotu different things. By indexing, i do not mean using index values to represent other rows on other tables, i mean creating a "snapshot" of the data to search, instead of searching the real data. Kind of how google does it. For instance, you would go through every value of every relavent column on every relavent row, such as the ones containing important text, and pull out each word and put them in a table, next go through each word and count how many times it's in an entry. You build a table that references each word to each result, in my case, a company, and stores it. Then when you search for a word, you return the companies that have that word, but you never go through all of the table.

Where i'm having trouble and slowdowns is all of the cross referencing. I'm returning one value each time, a name of a company, but i'm searchign through all of their records, locations, people attached to them, work, etc. When you link about 10 tables, it slows down quite a bit. I dont' have a lot of records or data in them, but its the "JOIN" state ments that are killing me. I was hoping to do an index each night, and use that as a search. It would not be up to teh second, but would greatly improve search speed.

this is my query string that is taking so long:

SELECT table_01.table_01_id, table_01.name, last, first, city, state , MATCH ( table_01.description ) AGAINST ('search_term') + MATCH ( table_01.capabilities )  AGAINST ('search_term' ) + MATCH ( table_01.comments ) AGAINST ('search_term' ) + MATCH ( table_07.workdesc ) AGAINST ('search_term' ) + MATCH ( table_07.client ) AGAINST ( 'search_term' ) + MATCH ( table_07.table_07 ) AGAINST ( 'search_term' ) + MATCH ( table_09.description ) AGAINST ( 'search_term' ) + MATCH ( table_09.datatext ) AGAINST ( 'search_term' ) + MATCH ( city ) AGAINST ( 'search_term' ) + MATCH ( table_05.last ) AGAINST ( 'search_term' ) + MATCH ( table_05.first ) AGAINST ( 'search_term' ) + MATCH ( table_10.client ) AGAINST ( 'search_term' ) + MATCH ( table_10.department ) AGAINST ( 'search_term' ) + MATCH ( table_12.nature ) AGAINST ( 'search_term' ) + MATCH ( table_11.title ) AGAINST ( 'search_term' ) + MATCH ( table_11.desc ) AGAINST ('search_term' ) + MATCH ( table_08.note ) AGAINST ( 'search_term' ) + MATCH ( table_01.name ) AGAINST ( 'search_term' ) AS score 

FROM table_01 
LEFT JOIN table_02 ON table_01.table_01_id = table_02.table_01_id 
LEFT JOIN table_03 ON table_02.table_03_id = table_03.table_03_id 
LEFT JOIN table_04_01 ON table_01.table_01_id = table_04_01.table_01_id 
LEFT JOIN table_05 ON table_04_01.table_05_id = table_05.table_05_id 
LEFT JOIN table_06 ON table_01.table_01_id = table_06.table_01_id 
LEFT JOIN table_07 ON table_01.table_01_id = table_07.table_01_id 
LEFT JOIN table_08 ON table_06.table_06_id = table_08.table_06_id 
LEFT JOIN table_09 ON ( table_01.table_01_id = table_09.table_01_id OR table_07.table_07_id = table_09.table_07_id OR table_06.table_06_id = table_09.protable_09_id ) 
LEFT JOIN table_10 ON table_01.table_01_id = table_10.table_01_id 
LEFT JOIN table_11 ON table_07.table_07_id = table_11.table_07_id 
LEFT JOIN table_12 ON table_10.table_10_id = table_12.table_10_id 

WHERE ( 
MATCH ( table_01.description ) AGAINST ('search_term') 
OR MATCH ( table_01.capabilities ) AGAINST ('search_term' ) 
OR MATCH ( table_01.comments ) AGAINST ('search_term' ) 
OR MATCH ( table_07.workdesc ) AGAINST ('search_term' ) 
OR MATCH ( table_07.client ) AGAINST ( 'search_term' ) 
OR MATCH ( table_07.table_07 ) AGAINST ( 'search_term' ) 
OR MATCH ( table_09.description ) AGAINST ( 'search_term' ) 
OR MATCH ( table_09.datatext ) AGAINST ( 'search_term' ) 
OR MATCH ( city ) AGAINST ( 'search_term' ) 
OR MATCH ( table_05.last ) AGAINST ( 'search_term' ) 
OR MATCH ( table_05.first ) AGAINST ( 'search_term' ) 
OR MATCH ( table_10.client ) AGAINST ( 'search_term' ) 
OR MATCH ( table_10.department ) AGAINST ( 'search_term' ) 
OR MATCH ( table_12.nature ) AGAINST ( 'search_term' ) 
OR MATCH ( table_11.title ) AGAINST ( 'search_term' ) 
OR MATCH ( table_11.desc ) AGAINST ('search_term' ) 
OR MATCH ( table_08.note ) AGAINST ( 'search_term' ) 
OR MATCH ( table_01.name ) AGAINST ( 'search_term' ) 
) 

GROUP BY table_01.table_01_id 
ORDER BY score DESC

no real surprise it takes 64 seconds on a PII 450mhz w/ 190megs of ram. Its real fast when it doesn't do a text search because it doesn't have to join those other tables. Also, i turned on caching which speeds up duplicate searches, but the system isn't used enough for that. It doesn't have data added to it that often, it just needs to query this faster.

#7 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 20 May 2006 - 03:35 PM

That's because you're using so many joins and so many different fulltext indexes. You'd have to maintain your own index which you update as you insert/update records into these various tables & fields. That way, you'd only have to query a single table.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#8 warnockm

warnockm
  • Members
  • PipPip
  • Member
  • 26 posts

Posted 21 May 2006 - 07:25 PM

[!--quoteo(post=375515:date=May 20 2006, 11:35 AM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ May 20 2006, 11:35 AM) View Post[/div][div class=\'quotemain\'][!--quotec--]
That's because you're using so many joins and so many different fulltext indexes. You'd have to maintain your own index which you update as you insert/update records into these various tables & fields. That way, you'd only have to query a single table.
[/quote]

Sorry for my ignorance. would this be a table used only for the search? When i return search results, i only return one thing, the company. So would i create a row for each company, with the text that i would be searching for? So if there are a few columns from each joined table, i would put the data from each searchable column into this new table, so when i search, it only searches one place, w/ no joined tables?

thanks for your patience and explanations!

#9 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 22 May 2006 - 01:03 AM

That's the idea... since you probably search more often than you update/insert, maintaining this index table won't be too much overhead, and you'll see a huge performance increase.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#10 warnockm

warnockm
  • Members
  • PipPip
  • Member
  • 26 posts

Posted 25 May 2006 - 04:11 AM

I wanted to thank you for your help. I sucessfully implemented a index running in parallel to my data. TO build the index takes about a minute; almost the entire time is in the mysql query. Then to update the search table when the data changes doesn't add a noticable difference and searches take fractions of a second. It's exactly what i needed.

#11 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 25 May 2006 - 05:21 AM

Glad you got it working -- using a table as a custom index is no small task -- give yourself a pat on the back. But once you get the hang of it, you'll find yourself using it quite often.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users