mattgleeson Posted April 5, 2007 Share Posted April 5, 2007 Hi I need to create a search engine for my website. It will search page content stored in the MySQL database. The content stored in the database is pre-formatted with html and when I do a fulltext search against the content I want it to ignore matches of search words which are enclosed in tags (i.e. html tag keywords). I.e. if someone does a search for 'style', I don't want it to match instances of 'style' enclosed in html tags < > like <div style="abc"> etc. Row 1 contents might be '<div><p>this years style is ...</p></div>' Row 2 contents might by '<div style="abc"><p>this years fashion is ...</p></div>' I want the fulltext search to match row 1 but not row 2. Is this possible to do this with fulltext searches and if not, can you give me advice on how to search html formatted text stored in the DB without matching html tags (if the user happened to search for a word that is also a html tag) Many thanks in advance Quote Link to comment https://forums.phpfreaks.com/topic/45722-fulltext-searching-on-html-formatted-contents/ Share on other sites More sharing options...
Wildbug Posted April 5, 2007 Share Posted April 5, 2007 Store a second table or column with only the text content of the HTML and search that instead. It will take more disk space, but it will increase performance and isn't impossible. Quote Link to comment https://forums.phpfreaks.com/topic/45722-fulltext-searching-on-html-formatted-contents/#findComment-222171 Share on other sites More sharing options...
mattgleeson Posted April 5, 2007 Author Share Posted April 5, 2007 Thanks for your reply. Unfortunately I already store the contents twice in the database and the database has the potential to grow big so storing the content a third time in the DB isn't really an option. The second copy of the content I store in the DB isn't html formatted but it is split up into different sections and the table uses InnoDB so I can't do full text searching on it. Quote Link to comment https://forums.phpfreaks.com/topic/45722-fulltext-searching-on-html-formatted-contents/#findComment-222287 Share on other sites More sharing options...
Wildbug Posted April 5, 2007 Share Posted April 5, 2007 Well, I don't think it will be possible to successfully perform a search in MySQL that ignores HTML, and if it is, it will probably be resource-intensive. Once you cut out ALL of the HTML from the average webpage, what remains is usually not that big, so the content-only solution may not be as large as you think. Of course, if it definitely won't meet your needs, you could also make your own "common word" filter and store only unusual words or phrases in a seperate column for text searching, although that will break up content and might foil quoted searches ("this and that" if you've removed all of the conjunctions from the text, won't work). That's how'd I might do it, anyway. Good luck. Quote Link to comment https://forums.phpfreaks.com/topic/45722-fulltext-searching-on-html-formatted-contents/#findComment-222309 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.