Jump to content

Query that uses HAVING is limiting results incorrectly


lb3000

Recommended Posts

I have an advanced search form that offers a bunch of ways to filter your search. Here's a simplified idea (doesn't include keyword text input or date range searches or other select menus):

Topic: 
<select><option>any</option><option>all</option></select>
[] Aging
[] Environment
[] Health
[] Hunger
[] Poverty

Document type: 
<select><option>any</option><option>all</option></select>
[] Case Study
[] Policy Brief
[] Whitepaper
  • If someone selects "any" when they choose more than one topic or document type, the query needs to include, eg., topic = "Aging" OR topic = "Health".
  • If someone selects "all" when they choose more than one topic or document type, the query needs to include, eg., topic = "Aging" AND topic = "Health".
  • We default to "AND" between these different filters. So when you search for all documents categorized under Aging and all documents categorized as a whitepaper, the query is: topic = "Aging" AND doctype = "whitepaper".

The Problem: We have a query that is working when the search is for "any". But when the search is for "all", according to MySQL's "EXPLAIN" command, we have an "impossible WHERE". :(
 

Here is the query that works when someone selects "any" for both topic and document type:

SELECT 
DISTINCT * 
FROM research 
JOIN link_resource_doctype ON link_resource_doctype.resource_id = research.research_id 
JOIN doctype ON doctype.id = link_resource_doctype.doctype_id 
JOIN link_resource_issue_area ON link_resource_issue_area.resource_id = research.research_id 
JOIN issue_area ON issue_area.id = link_resource_issue_area.issue_area_id 
WHERE approved = '1' 
AND (doctype.identifier = 'case_study' OR doctype.identifier = 'whitepaper') 
AND (issue_area.identifier = 'aging' OR issue_area.identifier = 'health')

And here's the same query which does not work when someone selects "all" for both topic and document type (this also doesn't work if someone selects just topic or just document type):

SELECT 
DISTINCT * 
FROM research 
JOIN link_resource_doctype ON link_resource_doctype.resource_id = research.research_id 
JOIN doctype ON doctype.id = link_resource_doctype.doctype_id 
JOIN link_resource_issue_area ON link_resource_issue_area.resource_id = research.research_id 
JOIN issue_area ON issue_area.id = link_resource_issue_area.issue_area_id 
WHERE approved = '1' 
AND (doctype.identifier = 'case_study' AND doctype.identifier = 'whitepaper') 
AND (issue_area.identifier = 'aging' AND issue_area.identifier = 'health')

Possible solution but there's a problem: I came across this post on Stackoverflow -- Select row belonging to multiple categories -- which contains a query that I think might solve our problem when someone selects "all". Here it is:

SELECT 
DISTINCT * 
FROM research 
JOIN link_issue_area ON link_issue_area.resource_id = research.research_id 
JOIN link_doctype ON link_doctype.resource_id = research.research_id 
WHERE issue_area.identifier IN ('aging', 'health')
AND
doctype_id.identifier IN ('case_study', 'whitepaper')
GROUP BY research.research_id
HAVING COUNT(DISTINCT issue_area.identifier) = 2 
AND
COUNT(DISTINCT doctype.identifier) = 2

The Problem> This query seems to work for either "any" or "all", except for one problem: Say that a document is categorized under Aging and Health and Poverty but the person searching only checked off Aging and Health. The document that is categorized under Aging and Health and Poverty will not appear in the search result list. I think this is because of the HAVING COUNT (DISTINCT issue_area.identifier) = 2 -- the 2 excludes any document that has a COUNT that is not exactly 2. Is there a work-around for this? Or a better query to use here?

Any insight, ideas, assistance much appreciated! Thanks!

 

Here's an SQLfiddle that gets at all of this too: http://sqlfiddle.com/#!2/847362/1

Edited by lb3000
Link to comment
Share on other sites

queries work by finding rows that result in a logically true WHERE clause.

 

when you use OR - (doctype.identifier = 'case_study' OR doctype.identifier = 'whitepaper') ... this it a TRUE value for rows with case_study in them and for rows with whitepaper in them. the rows that have a case_study value result in a logical TRUE OR FALSE expression, which is TRUE. the rows that have a whitepaper value result in a logical FALSE OR TRUE expression, which is TRUE.

 

when you use AND - (doctype.identifier = 'case_study' AND doctype.identifier = 'whitepaper') ... this can never be true for any row(s) since there cannot be any row(s) that are both case_study AND whitepaper at the same time. the rows that have a case_study value result in a logical TRUE AND FALSE expression, which is FALSE. the rows that have a whitepaper value result in a logical FALSE AND TRUE expression, which is FALSE.

 

 

your 'all' queries should leave the entire term out of the WHERE clause (to produce the fastest query execution.)

Edited by mac_gyver
Link to comment
Share on other sites

Thanks mac_gyver, and I apologize - I'm not sure what you mean when you write "your 'all' queries should leave the entire term out of the WHERE clause (to produce the fastest query execution.)". Are you suggesting that we continue to use the original query, which works for "any" queries, but modify it so that, when someone searches for "all" terms, there is no WHERE clause? 

 

Original query modified to not include terms in WHERE clause:

SELECT
DISTINCT *
FROM research
JOIN link_resource_doctype ON link_resource_doctype.resource_id = research.research_id
JOIN doctype ON doctype.id = link_resource_doctype.doctype_id
JOIN link_resource_issue_area ON link_resource_issue_area.resource_id = research.research_id
JOIN issue_area ON issue_area.id = link_resource_issue_area.issue_area_id
WHERE approved = '1'

I don't understand how this will provide a result set based on what someone searched for? Maybe you are suggesting some combination of the original query and the last query I posted which does seem to work, except for the limitation where it provides records that are categorized under only 2 categories -- and so skips any records that include the 2 categories requested but also may be categorized under additional categories. 

 

Thanks for your help.

Link to comment
Share on other sites

when you pick 'all' for the Topic (or for the Document Type) that means you want all records regardless of the Topic. That means that the Topic value is a "don't care" value, i.e. match everything in the Topic column. the way to do that is to just leave the AND (issue_area.identifier = 'aging' OR issue_area.identifier = 'health') out of the Where clause.

Link to comment
Share on other sites

What he was saying if you have a meal table, with a column for fruit, and the possible fruits are (apple, orange, pear), and you want to get all rows, regardless of the specific fruit, then there is no reason to generate this sql:

 

SELECT * FROM meal WHERE fruit = 'apple' OR fruit = 'orange' OR fruit = 'pear';

 

You'll get the same result using SELECT * FROM meal.

 

Your "any" selection should simply omit that criteria from the query entirely, and the query will work.

 

GROUP BY and its related functions is something different entirely -- it's for finding groups of rows for the purposes of using the GROUP operators like COUNT, SUM, AVG etc. on those groups. When you're just looking for rows that match a criteria GROUP BY is not going to make anything faster or better, and given the overhead, probably will make things slower. Since it does in fact create groups of rows based on the GROUP BY, it could be delivering the wrong answer as well, based on your desired outcome. Don't use it unless you really understand what it does and why -- it's not an alternative.

Link to comment
Share on other sites

Oh - got it. I'm sorry I wasn't clear about "any" and "all". 

 

If they pick "any" and then check 2 categories out of 10 available categories, then what they are saying is give me either cat1 or cat2 or both cat1 and cat2. So our query looks like: AND (issue_area.identifier = 'aging' OR issue_area.identifier ='health').

 

If they pick "all" and then check 2 categories out of 10 available categories, then what they are saying is give me "all" that I have checked -- not all of 10 categories.  So, they want only records that include cat1 AND cat2. 

 

If they don't check anything in a category box, then we leave out any reference to that category -- which is what you are saying. If they don't check anything, then that category doesn't matter to them -- give them everything.

Edited by lb3000
Link to comment
Share on other sites

i kind of though that might be what you were after, but your example (before the edit) didn't show that.

 

you would still use OR in the where clause (which is the same as using the IN() comparison operator), but you would use GROUP BY ... to consolidate the matching rows together, so that you can use HAVING COUNT( ... ) = 2 to just match those that have exactly two of the checked options.

 

i'm not sure it logically makes sense to have more than one 'all' choice at one time, but as long as the COUNT( ... ) terms produce the result you want, it may work.  to test, add the COUNT( ... ) terms in the SELECT list, and temporarily remove the HAVING term.

Link to comment
Share on other sites

Ok - I think that modification is working pretty good. I have this set up in SQL Fiddle -- http://sqlfiddle.com/#!2/c5e65/1 -- and set up the query to look like this:

 

SELECT   research.research_id AS resource_id, 
research.title, 
COUNT(DISTINCT issue_area_id)=2,
COUNT(DISTINCT doctype_id)=2
FROM     research 
JOIN     link_issue_area ON link_issue_area.resource_id = research.research_id 
JOIN     link_doctype ON link_doctype.resource_id = research.research_id 
WHERE    issue_area_id IN ('aging','health')
AND      doctype_id IN ('case_study','evaluation')
GROUP BY resource_id

I changed some things and ran it and got the results I was looking for. I will play with this more today and let you know what happens. 

 

Thanks mac_gyver for all of your help with this. Much appreciated!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.