jibster Posted July 20, 2008 Share Posted July 20, 2008 Hi I'm creating a site where people can make a post. Most HTML is allowed and saved to the database. I'm currently trying to create a search function for users to search the content of a post. I basically want the search to only look at what would be displayed when viewing a post, rather than the HTML formatting as well if that makes sense. E.g - The code of a post in the db: <a name='blah' href='http://www.google.com'>Visit google</a> The search for 'name' should not return the above post. but a search for 'Visit' or 'Google' should. I sort of need it to only search what a browser would render which I cant really imagine how to do. Hope that makes sense, any pointers appreciated. Thanks, Jibster Quote Link to comment https://forums.phpfreaks.com/topic/115705-solved-finding-needle-in-haystack/ Share on other sites More sharing options...
teynon Posted July 20, 2008 Share Posted July 20, 2008 There are three ways I can think of, and one I prefer for super fast searching. First, you could search for the keywords, then eliminate the HTML tags, and have PHP search the string again to see if the result remains true. Second, you could create and additional field in you MySQL that holds the content minus HTML codes. PHP would strip the HTML code before it is submitted to the database. Third, and my favorite. PHP strips the html code, then indexes each word individually to a seperate table, which contains an index to all post ID's that have this word in it. So for example you have the word "visit". So in the "QuickIndex" table it would look like this: ID, WORD, MATCHES 1, "visit", 2,8,10,24 Basically this means that when I search for the word visit, all I have to do is search "QuickIndex". That immediately pulls up 4 results, result id 2, result id 8, 10, 24, etc. This system is a little more complicated to implement right, but is extremely quick and efficient. I used this when I made a large program that indexed over a terabyte of generic text. Saved an amazing amount of time in searching. In fact regular searching was impossible. This method made it quick because there are only so many possible matches. Quote Link to comment https://forums.phpfreaks.com/topic/115705-solved-finding-needle-in-haystack/#findComment-594846 Share on other sites More sharing options...
jibster Posted July 20, 2008 Author Share Posted July 20, 2008 Cheers for that, some good ideas. So for the last example, when you create a new post you would also need to add to the "QuickIndex" table each time? Does this slow down the adding process much? Have you tried your second suggesition of having a different column with just the keywords in, sans the tags? That seems like an easy implementation and also would be pretty quick I'm thinking too. Thanks again! Quote Link to comment https://forums.phpfreaks.com/topic/115705-solved-finding-needle-in-haystack/#findComment-594944 Share on other sites More sharing options...
teynon Posted July 20, 2008 Share Posted July 20, 2008 Adding those words won't add much time at all considering that the insertion process happens so fast it's not noticable. The second suggestion will take forever to search if you get any large amounts of data because MySQL will have to search through every result. The third one limits the possible results and eliminates the need for doing multiple queries if you do the query right. If your not planning on having a lot of data, just do number two. Quote Link to comment https://forums.phpfreaks.com/topic/115705-solved-finding-needle-in-haystack/#findComment-595060 Share on other sites More sharing options...
unkwntech Posted July 21, 2008 Share Posted July 21, 2008 I just read this post and option number 3 has given me some interesting ideas.... Quote Link to comment https://forums.phpfreaks.com/topic/115705-solved-finding-needle-in-haystack/#findComment-595092 Share on other sites More sharing options...
jibster Posted July 24, 2008 Author Share Posted July 24, 2008 Thanks, marking as solved. Quote Link to comment https://forums.phpfreaks.com/topic/115705-solved-finding-needle-in-haystack/#findComment-598751 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.