Jump to content

Archived

This topic is now archived and is closed to further replies.

warnockm

mysql index for faster search

Recommended Posts

Hi, i'm currently maintaining a web application i built that stores a lot of text data on a slow computer. The tables are heavily cross referenced, and because of this, my text searches take about a minute to perform. I'm looking at creating an index that is updated when an entry is updated, or nightly. Does anyone have any example code on doing this? I think the idea is to gather all of the words in the mysql tables, and then go through each table, count the number or words and assign the value of each page and word combination. I'd like to see examples of people doing this.

Thanks!

Share this post


Link to post
Share on other sites
Hey there. Just to be sure, do you know the mysql fulltext (boolean) search possibility?
It can return a score that more or less determines the relevance of the search.
It's also quite fast, even on large databases.

Here's an excellent tutorial: [a href=\"http://www.phpfreaks.com/tutorials/129/0.php\" target=\"_blank\"]http://www.phpfreaks.com/tutorials/129/0.php[/a]

Share this post


Link to post
Share on other sites
FULLTEXT searching, IMHO, has never been properly implemented... if you're concerned about speed, then yes, you should build your own index.

Share this post


Link to post
Share on other sites
[!--quoteo(post=374899:date=May 18 2006, 06:15 AM:name=semtex)--][div class=\'quotetop\']QUOTE(semtex @ May 18 2006, 06:15 AM) [snapback]374899[/snapback][/div][div class=\'quotemain\'][!--quotec--]
Hey there. Just to be sure, do you know the mysql fulltext (boolean) search possibility?
It can return a score that more or less determines the relevance of the search.
It's also quite fast, even on large databases.

Here's an excellent tutorial: [a href=\"http://www.phpfreaks.com/tutorials/129/0.php\" target=\"_blank\"]http://www.phpfreaks.com/tutorials/129/0.php[/a]
[/quote]

I'm doing full text searches on all of the fields. I like how it gives me a score, but i don't like how slow it is. Any examples on the indexing?

Share this post


Link to post
Share on other sites
I don't know what you mean by "examples of indexing".

Share this post


Link to post
Share on other sites
[!--quoteo(post=375257:date=May 19 2006, 11:43 AM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ May 19 2006, 11:43 AM) [snapback]375257[/snapback][/div][div class=\'quotemain\'][!--quotec--]
I don't know what you mean by "examples of indexing".
[/quote]

Yeah, i was afraid we may be talkign abotu different things. By indexing, i do not mean using index values to represent other rows on other tables, i mean creating a "snapshot" of the data to search, instead of searching the real data. Kind of how google does it. For instance, you would go through every value of every relavent column on every relavent row, such as the ones containing important text, and pull out each word and put them in a table, next go through each word and count how many times it's in an entry. You build a table that references each word to each result, in my case, a company, and stores it. Then when you search for a word, you return the companies that have that word, but you never go through all of the table.

Where i'm having trouble and slowdowns is all of the cross referencing. I'm returning one value each time, a name of a company, but i'm searchign through all of their records, locations, people attached to them, work, etc. When you link about 10 tables, it slows down quite a bit. I dont' have a lot of records or data in them, but its the "JOIN" state ments that are killing me. I was hoping to do an index each night, and use that as a search. It would not be up to teh second, but would greatly improve search speed.

this is my query string that is taking so long:

[code]
SELECT table_01.table_01_id, table_01.name, last, first, city, state , MATCH ( table_01.description ) AGAINST ('search_term') + MATCH ( table_01.capabilities )  AGAINST ('search_term' ) + MATCH ( table_01.comments ) AGAINST ('search_term' ) + MATCH ( table_07.workdesc ) AGAINST ('search_term' ) + MATCH ( table_07.client ) AGAINST ( 'search_term' ) + MATCH ( table_07.table_07 ) AGAINST ( 'search_term' ) + MATCH ( table_09.description ) AGAINST ( 'search_term' ) + MATCH ( table_09.datatext ) AGAINST ( 'search_term' ) + MATCH ( city ) AGAINST ( 'search_term' ) + MATCH ( table_05.last ) AGAINST ( 'search_term' ) + MATCH ( table_05.first ) AGAINST ( 'search_term' ) + MATCH ( table_10.client ) AGAINST ( 'search_term' ) + MATCH ( table_10.department ) AGAINST ( 'search_term' ) + MATCH ( table_12.nature ) AGAINST ( 'search_term' ) + MATCH ( table_11.title ) AGAINST ( 'search_term' ) + MATCH ( table_11.desc ) AGAINST ('search_term' ) + MATCH ( table_08.note ) AGAINST ( 'search_term' ) + MATCH ( table_01.name ) AGAINST ( 'search_term' ) AS score

FROM table_01
LEFT JOIN table_02 ON table_01.table_01_id = table_02.table_01_id
LEFT JOIN table_03 ON table_02.table_03_id = table_03.table_03_id
LEFT JOIN table_04_01 ON table_01.table_01_id = table_04_01.table_01_id
LEFT JOIN table_05 ON table_04_01.table_05_id = table_05.table_05_id
LEFT JOIN table_06 ON table_01.table_01_id = table_06.table_01_id
LEFT JOIN table_07 ON table_01.table_01_id = table_07.table_01_id
LEFT JOIN table_08 ON table_06.table_06_id = table_08.table_06_id
LEFT JOIN table_09 ON ( table_01.table_01_id = table_09.table_01_id OR table_07.table_07_id = table_09.table_07_id OR table_06.table_06_id = table_09.protable_09_id )
LEFT JOIN table_10 ON table_01.table_01_id = table_10.table_01_id
LEFT JOIN table_11 ON table_07.table_07_id = table_11.table_07_id
LEFT JOIN table_12 ON table_10.table_10_id = table_12.table_10_id

WHERE (
MATCH ( table_01.description ) AGAINST ('search_term')
OR MATCH ( table_01.capabilities ) AGAINST ('search_term' )
OR MATCH ( table_01.comments ) AGAINST ('search_term' )
OR MATCH ( table_07.workdesc ) AGAINST ('search_term' )
OR MATCH ( table_07.client ) AGAINST ( 'search_term' )
OR MATCH ( table_07.table_07 ) AGAINST ( 'search_term' )
OR MATCH ( table_09.description ) AGAINST ( 'search_term' )
OR MATCH ( table_09.datatext ) AGAINST ( 'search_term' )
OR MATCH ( city ) AGAINST ( 'search_term' )
OR MATCH ( table_05.last ) AGAINST ( 'search_term' )
OR MATCH ( table_05.first ) AGAINST ( 'search_term' )
OR MATCH ( table_10.client ) AGAINST ( 'search_term' )
OR MATCH ( table_10.department ) AGAINST ( 'search_term' )
OR MATCH ( table_12.nature ) AGAINST ( 'search_term' )
OR MATCH ( table_11.title ) AGAINST ( 'search_term' )
OR MATCH ( table_11.desc ) AGAINST ('search_term' )
OR MATCH ( table_08.note ) AGAINST ( 'search_term' )
OR MATCH ( table_01.name ) AGAINST ( 'search_term' )
)

GROUP BY table_01.table_01_id
ORDER BY score DESC[/code]

no real surprise it takes 64 seconds on a PII 450mhz w/ 190megs of ram. Its real fast when it doesn't do a text search because it doesn't have to join those other tables. Also, i turned on caching which speeds up duplicate searches, but the system isn't used enough for that. It doesn't have data added to it that often, it just needs to query this faster.

Share this post


Link to post
Share on other sites
That's because you're using so many joins and so many different fulltext indexes. You'd have to maintain your own index which you update as you insert/update records into these various tables & fields. That way, you'd only have to query a single table.

Share this post


Link to post
Share on other sites
[!--quoteo(post=375515:date=May 20 2006, 11:35 AM:name=fenway)--][div class=\'quotetop\']QUOTE(fenway @ May 20 2006, 11:35 AM) [snapback]375515[/snapback][/div][div class=\'quotemain\'][!--quotec--]
That's because you're using so many joins and so many different fulltext indexes. You'd have to maintain your own index which you update as you insert/update records into these various tables & fields. That way, you'd only have to query a single table.
[/quote]

Sorry for my ignorance. would this be a table used only for the search? When i return search results, i only return one thing, the company. So would i create a row for each company, with the text that i would be searching for? So if there are a few columns from each joined table, i would put the data from each searchable column into this new table, so when i search, it only searches one place, w/ no joined tables?

thanks for your patience and explanations!

Share this post


Link to post
Share on other sites
That's the idea... since you probably search more often than you update/insert, maintaining this index table won't be too much overhead, and you'll see a huge performance increase.

Share this post


Link to post
Share on other sites
I wanted to thank you for your help. I sucessfully implemented a index running in parallel to my data. TO build the index takes about a minute; almost the entire time is in the mysql query. Then to update the search table when the data changes doesn't add a noticable difference and searches take fractions of a second. It's exactly what i needed.

Share this post


Link to post
Share on other sites
Glad you got it working -- using a table as a custom index is no small task -- give yourself a pat on the back. But once you get the hang of it, you'll find yourself using it quite often.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.