Jump to content

Recommended Posts

Hi

 

I need to create a search engine for my website. It will search page content stored in the MySQL database. The content stored in the database is pre-formatted with html and when I do a fulltext search against the content I want it to ignore matches of search words which are enclosed in tags (i.e. html tag keywords). I.e. if someone does a search for 'style', I don't want it to match instances of 'style' enclosed in html tags < > like <div style="abc"> etc.

Row 1 contents might be '<div><p>this years style is ...</p></div>'

Row 2 contents might by '<div style="abc"><p>this years fashion is ...</p></div>'

 

I want the fulltext search to match row 1 but not row 2.

 

Is this possible to do this with fulltext searches and if not, can you give me advice on how to search html formatted text stored in the DB without matching html tags (if the user happened to search for a word that is also a html tag)

 

Many thanks in advance

Thanks for your reply. Unfortunately I already store the contents twice in the database and the database has the potential to grow big so storing the content a third time in the DB isn't really an option.

The second copy of the content I store in the DB isn't html formatted but it is split up into different sections and the table uses InnoDB so I can't do  full text searching on it.

Well, I don't think it will be possible to successfully perform a search in MySQL that ignores HTML, and if it is, it will probably be resource-intensive.  Once you cut out ALL of the HTML from the average webpage, what remains is usually not that big, so the content-only solution may not be as large as you think.  Of course, if it definitely won't meet your needs, you could also make your own "common word" filter and store only unusual words or phrases in a seperate column for text searching, although that will break up content and might foil quoted searches ("this and that" if you've removed all of the conjunctions from the text, won't work).

 

That's how'd I might do it, anyway.  Good luck.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.