jggretton Posted June 24, 2007 Share Posted June 24, 2007 Hi all, I'm creating a website that lists snippits of online articles. As tagging these articles will be very important, I've created a database structure as below. I'm now implementing searching of these tags with the hope to sort potentially 1000's of matches by relevance on tags and on date (the newer the better). Before launching head first into this, can anyone point me in the way of a good tutorial specifically on tag searches. (I can find plenty of fulltext sytle search information, but not so much on tagging). Many thanks, James ------------- Table "article" id|text|...|creationDateTime|updateDateTime 1|"...."|...| 2006-......... | ...... 2|"...."|...| 2006-......... | ...... ------------- Table: "tags" id| tag |count 1| "car" | 0 2| "boat"| 2 3| "tree" | 1 -------------- Table: "article_tags" id|article|tag 1| 2 | 2 2| 2 | 3 3| 1 | 2 ............... Quote Link to comment Share on other sites More sharing options...
jggretton Posted June 25, 2007 Author Share Posted June 25, 2007 Any ideas guys? Cheers, James Quote Link to comment Share on other sites More sharing options...
redarrow Posted June 25, 2007 Share Posted June 25, 2007 If you do sorting inside of MySQL and it comes together with selfjoins etc it might become quite ugly. True it can - I've come across this a fair few times myself! Usually it's pretty workable though. One thing with self-joins like this is that I've really wanted the ability to say to MySQL "order by any tag table I don't mind it doesn't matter" when I'm trying to find the 'top' N objects that match a set of tags. Usually I've had 1 object table and X tag tables in a query (as a result of self-joins), and I've wanted to have an ORDER BY on the tag table that mysql looks at first so that I can skip a big filesort just to find the latest N that match. As a workaround you can run a bunch of EXPLAINs first to see which of the tag tables in the self join seems to match the fewest rows and then either order by that one and hope mysql picks it as the first table, or just force a STRAIGHT_JOIN yourself. Either way it's lots of "fun" When you guess wrong you can really suffer if you have a large dataset though. A pretty horrible alternative with MyISAM at least is to leave the ORDER BY clause off altogether as you can often get things back in the index order you want anyway but this is really really hacky and I wouldn't recommend it at all Another alternative if you're looking for say the latest N objects where all the tags you're searching by are very common is just to 'scan' through the object table using a time index rather than have the database search by tag first. (eg: USE INDEX (time_added)) This works much much more quickly than a large filesort in many cases. (BTW Peter I know you know all this stuff I just thought it might be of interest to Zanzi Actually I've been comparing the self join method with the fulltext method just today, and found the self join to be about twice as quick in most cases if you build your indexes and queries carefully. Both are pretty fast though. (150K objects, 500K tags, always under 0.1 secs for self join (even moderately complex ones), usually around 0.15 secs for boolean mode fulltext) The two approaches are not really like-for-like though: fulltext will make writing queries much easier (see: IN BOOLEAN MODE) but as you have to use MyISAM you lose referential integrity etc etc. I'm sure there are other plusses and minuses to each approach too. My current thinking for a project I'm currently working on is to use a combination of both, though the full text approach will probably be used just for things that I'm too lazy to write a proper query for (just kidding - I'm not 100% sure what I'm going to do yet) Also Zanzi, I've never really tried the normalised 3 table approach to this. I started doing these things a long time before MySQL got the greedy join plan optimiser so didn't want to have to join 13 tables for a 6 tag search as it would have taken ages to come up with the query plan. Also I've never really seriously looked at alternatives to fulltext for this because the projects I've been working on have been able to be 'non-fuzzy' in their approach. That is, they've not been search engine style 'something that matches pretty well is fine' type systems, they've needed to return quite specific results. http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.