Jump to content


Photo

indexing text in databases


  • Please log in to reply
3 replies to this topic

#1 stubarny

stubarny
  • Members
  • PipPipPip
  • Advanced Member
  • 123 posts

Posted 29 August 2006 - 10:27 PM

Hi,

I'm trying to code a search query for a database of about 500,000 job adverts (similar advert length to monster, careerbuilder etc).

I have found this search code (http://iamcal.com/pu...les/php/search/) which is suitable for 'small' tables. However at the bottom of the article it mentions that using a FULLTEXT index in mysql would be effective for medium sized databases. I'm assuming that 500,000 adverts is a 'medium' database so I'm thinking this could be a good solution, without having to get to grips with specialist indexing software (e.g. Zend Search Lucene - http://framework.zen...end.search.html)

Please can you give me your opinions on whether or not you think simply using a FULLTEXT index is a good solution? (I'm only a 'keen amateur' php/sql programmer so i don't have much experience with the practicalities of running largish databases)

Thanks a lot for your your time!

Stewart

#2 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 29 August 2006 - 11:31 PM

500K is a good size for FULLTEXT -- it really depends on how "english-like" your search keywords are. 
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#3 stubarny

stubarny
  • Members
  • PipPipPip
  • Advanced Member
  • 123 posts

Posted 30 August 2006 - 02:50 PM

Thanks, that's good news. I'm planning to include non-english text a year or two in the future so I'll keep an eye on the CPU usage, hopefully the processors will have sped up enough by then so there won't be a problem.

Cheers,

Stu

#4 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 30 August 2006 - 02:57 PM

It actually has nothing to do with memory/CPU usage -- just that FULLTEXT is designed for English prose, so if you're looking for anything else, it's pretty much useless.  Also, it has some funny quirks that only make sense of large datasets.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users