lb3000 Posted September 5, 2013 Share Posted September 5, 2013 (edited) I have an advanced search form that offers a bunch of ways to filter your search. Here's a simplified idea (doesn't include keyword text input or date range searches or other select menus): Topic: <select><option>any</option><option>all</option></select> [] Aging [] Environment [] Health [] Hunger [] Poverty Document type: <select><option>any</option><option>all</option></select> [] Case Study [] Policy Brief [] Whitepaper If someone selects "any" when they choose more than one topic or document type, the query needs to include, eg., topic = "Aging" OR topic = "Health". If someone selects "all" when they choose more than one topic or document type, the query needs to include, eg., topic = "Aging" AND topic = "Health". We default to "AND" between these different filters. So when you search for all documents categorized under Aging and all documents categorized as a whitepaper, the query is: topic = "Aging" AND doctype = "whitepaper". The Problem: We have a query that is working when the search is for "any". But when the search is for "all", according to MySQL's "EXPLAIN" command, we have an "impossible WHERE". Here is the query that works when someone selects "any" for both topic and document type: SELECT DISTINCT * FROM research JOIN link_resource_doctype ON link_resource_doctype.resource_id = research.research_id JOIN doctype ON doctype.id = link_resource_doctype.doctype_id JOIN link_resource_issue_area ON link_resource_issue_area.resource_id = research.research_id JOIN issue_area ON issue_area.id = link_resource_issue_area.issue_area_id WHERE approved = '1' AND (doctype.identifier = 'case_study' OR doctype.identifier = 'whitepaper') AND (issue_area.identifier = 'aging' OR issue_area.identifier = 'health') And here's the same query which does not work when someone selects "all" for both topic and document type (this also doesn't work if someone selects just topic or just document type): SELECT DISTINCT * FROM research JOIN link_resource_doctype ON link_resource_doctype.resource_id = research.research_id JOIN doctype ON doctype.id = link_resource_doctype.doctype_id JOIN link_resource_issue_area ON link_resource_issue_area.resource_id = research.research_id JOIN issue_area ON issue_area.id = link_resource_issue_area.issue_area_id WHERE approved = '1' AND (doctype.identifier = 'case_study' AND doctype.identifier = 'whitepaper') AND (issue_area.identifier = 'aging' AND issue_area.identifier = 'health') Possible solution but there's a problem: I came across this post on Stackoverflow -- Select row belonging to multiple categories -- which contains a query that I think might solve our problem when someone selects "all". Here it is: SELECT DISTINCT * FROM research JOIN link_issue_area ON link_issue_area.resource_id = research.research_id JOIN link_doctype ON link_doctype.resource_id = research.research_id WHERE issue_area.identifier IN ('aging', 'health') AND doctype_id.identifier IN ('case_study', 'whitepaper') GROUP BY research.research_id HAVING COUNT(DISTINCT issue_area.identifier) = 2 AND COUNT(DISTINCT doctype.identifier) = 2 The Problem> This query seems to work for either "any" or "all", except for one problem: Say that a document is categorized under Aging and Health and Poverty but the person searching only checked off Aging and Health. The document that is categorized under Aging and Health and Poverty will not appear in the search result list. I think this is because of the HAVING COUNT (DISTINCT issue_area.identifier) = 2 -- the 2 excludes any document that has a COUNT that is not exactly 2. Is there a work-around for this? Or a better query to use here? Any insight, ideas, assistance much appreciated! Thanks! Here's an SQLfiddle that gets at all of this too: http://sqlfiddle.com/#!2/847362/1 Edited September 5, 2013 by lb3000 Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted September 5, 2013 Share Posted September 5, 2013 (edited) queries work by finding rows that result in a logically true WHERE clause. when you use OR - (doctype.identifier = 'case_study' OR doctype.identifier = 'whitepaper') ... this it a TRUE value for rows with case_study in them and for rows with whitepaper in them. the rows that have a case_study value result in a logical TRUE OR FALSE expression, which is TRUE. the rows that have a whitepaper value result in a logical FALSE OR TRUE expression, which is TRUE. when you use AND - (doctype.identifier = 'case_study' AND doctype.identifier = 'whitepaper') ... this can never be true for any row(s) since there cannot be any row(s) that are both case_study AND whitepaper at the same time. the rows that have a case_study value result in a logical TRUE AND FALSE expression, which is FALSE. the rows that have a whitepaper value result in a logical FALSE AND TRUE expression, which is FALSE. your 'all' queries should leave the entire term out of the WHERE clause (to produce the fastest query execution.) Edited September 5, 2013 by mac_gyver Quote Link to comment Share on other sites More sharing options...
lb3000 Posted September 5, 2013 Author Share Posted September 5, 2013 Thanks mac_gyver, and I apologize - I'm not sure what you mean when you write "your 'all' queries should leave the entire term out of the WHERE clause (to produce the fastest query execution.)". Are you suggesting that we continue to use the original query, which works for "any" queries, but modify it so that, when someone searches for "all" terms, there is no WHERE clause? Original query modified to not include terms in WHERE clause: SELECT DISTINCT * FROM research JOIN link_resource_doctype ON link_resource_doctype.resource_id = research.research_id JOIN doctype ON doctype.id = link_resource_doctype.doctype_id JOIN link_resource_issue_area ON link_resource_issue_area.resource_id = research.research_id JOIN issue_area ON issue_area.id = link_resource_issue_area.issue_area_id WHERE approved = '1' I don't understand how this will provide a result set based on what someone searched for? Maybe you are suggesting some combination of the original query and the last query I posted which does seem to work, except for the limitation where it provides records that are categorized under only 2 categories -- and so skips any records that include the 2 categories requested but also may be categorized under additional categories. Thanks for your help. Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted September 5, 2013 Share Posted September 5, 2013 when you pick 'all' for the Topic (or for the Document Type) that means you want all records regardless of the Topic. That means that the Topic value is a "don't care" value, i.e. match everything in the Topic column. the way to do that is to just leave the AND (issue_area.identifier = 'aging' OR issue_area.identifier = 'health') out of the Where clause. Quote Link to comment Share on other sites More sharing options...
gizmola Posted September 5, 2013 Share Posted September 5, 2013 What he was saying if you have a meal table, with a column for fruit, and the possible fruits are (apple, orange, pear), and you want to get all rows, regardless of the specific fruit, then there is no reason to generate this sql: SELECT * FROM meal WHERE fruit = 'apple' OR fruit = 'orange' OR fruit = 'pear'; You'll get the same result using SELECT * FROM meal. Your "any" selection should simply omit that criteria from the query entirely, and the query will work. GROUP BY and its related functions is something different entirely -- it's for finding groups of rows for the purposes of using the GROUP operators like COUNT, SUM, AVG etc. on those groups. When you're just looking for rows that match a criteria GROUP BY is not going to make anything faster or better, and given the overhead, probably will make things slower. Since it does in fact create groups of rows based on the GROUP BY, it could be delivering the wrong answer as well, based on your desired outcome. Don't use it unless you really understand what it does and why -- it's not an alternative. Quote Link to comment Share on other sites More sharing options...
lb3000 Posted September 5, 2013 Author Share Posted September 5, 2013 (edited) Oh - got it. I'm sorry I wasn't clear about "any" and "all". If they pick "any" and then check 2 categories out of 10 available categories, then what they are saying is give me either cat1 or cat2 or both cat1 and cat2. So our query looks like: AND (issue_area.identifier = 'aging' OR issue_area.identifier ='health'). If they pick "all" and then check 2 categories out of 10 available categories, then what they are saying is give me "all" that I have checked -- not all of 10 categories. So, they want only records that include cat1 AND cat2. If they don't check anything in a category box, then we leave out any reference to that category -- which is what you are saying. If they don't check anything, then that category doesn't matter to them -- give them everything. Edited September 5, 2013 by lb3000 Quote Link to comment Share on other sites More sharing options...
mac_gyver Posted September 5, 2013 Share Posted September 5, 2013 i kind of though that might be what you were after, but your example (before the edit) didn't show that. you would still use OR in the where clause (which is the same as using the IN() comparison operator), but you would use GROUP BY ... to consolidate the matching rows together, so that you can use HAVING COUNT( ... ) = 2 to just match those that have exactly two of the checked options. i'm not sure it logically makes sense to have more than one 'all' choice at one time, but as long as the COUNT( ... ) terms produce the result you want, it may work. to test, add the COUNT( ... ) terms in the SELECT list, and temporarily remove the HAVING term. Quote Link to comment Share on other sites More sharing options...
lb3000 Posted September 5, 2013 Author Share Posted September 5, 2013 Ok - I think that modification is working pretty good. I have this set up in SQL Fiddle -- http://sqlfiddle.com/#!2/c5e65/1 -- and set up the query to look like this: SELECT research.research_id AS resource_id, research.title, COUNT(DISTINCT issue_area_id)=2, COUNT(DISTINCT doctype_id)=2 FROM research JOIN link_issue_area ON link_issue_area.resource_id = research.research_id JOIN link_doctype ON link_doctype.resource_id = research.research_id WHERE issue_area_id IN ('aging','health') AND doctype_id IN ('case_study','evaluation') GROUP BY resource_id I changed some things and ran it and got the results I was looking for. I will play with this more today and let you know what happens. Thanks mac_gyver for all of your help with this. Much appreciated! Quote Link to comment Share on other sites More sharing options...
vinny42 Posted September 5, 2013 Share Posted September 5, 2013 Why did you put SELECT DISTINCT * in your first queries? That will also remove records that you may otherwise want to see. There are no duplicates so why DISTINCT? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.