Delixe Posted June 9, 2007 Share Posted June 9, 2007 UNION is MySQL is not supposed to provide duplicate rows however: SELECT *, (MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('XSS')) as score FROM fmc_archive_dev WHERE ((MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('XSS')) AND (archived_on > 1181277356) AND (cat_id = 3)) UNION SELECT *, (MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('WAF')) as score FROM fmc_archive_dev WHERE ((MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('WAF')) AND (archived_on > 1181320556) AND (cat_id = 1 OR cat_id = 3 OR cat_id = 4 OR cat_id = 11 OR cat_id = 12 OR cat_id = 13 OR cat_id = 14 OR cat_id = 15 OR cat_id = 16 OR cat_id = 23 OR cat_id = 43 OR cat_id = 44)) Produces duplicate columns, 3 rows in total, 2 with the same id. I want unique rows, what am I doing wrong? Quote Link to comment https://forums.phpfreaks.com/topic/54839-union-in-mysql-help/ Share on other sites More sharing options...
btherl Posted June 11, 2007 Share Posted June 11, 2007 Is every column identical for those rows? Keep in mind that null, empty string and 0 are all distinct values as far as union is concerned (though null is considered the same as null). And please reformat your query so it's readable Quote Link to comment https://forums.phpfreaks.com/topic/54839-union-in-mysql-help/#findComment-272083 Share on other sites More sharing options...
Delixe Posted June 17, 2007 Author Share Posted June 17, 2007 Is every column identical for those rows? Keep in mind that null, empty string and 0 are all distinct values as far as union is concerned (though null is considered the same as null). And please reformat your query so it's readable SELECT * , ( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'XSS' ) ) AS score FROM fmc_archive_dev WHERE ( ( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'XSS' ) ) AND ( archived_on >1181277356 ) AND ( cat_id =3 ) ) UNION SELECT * , ( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'WAF' ) ) AS score FROM fmc_archive_dev WHERE ( ( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'WAF' ) ) AND ( archived_on >1181320556 ) AND ( cat_id =1 OR cat_id =3 OR cat_id =4 OR cat_id =11 OR cat_id =12 OR cat_id =13 OR cat_id =14 OR cat_id =15 OR cat_id =16 OR cat_id =23 OR cat_id =43 OR cat_id =44 ) ) btherl: Thank you so much, if you take a look in my query you'll notice that different keywords have a score, the rows have different scores which make them unique. Is there anyway to apply a score to all of those on a global level so I do not get duplicate rows? Quote Link to comment https://forums.phpfreaks.com/topic/54839-union-in-mysql-help/#findComment-276390 Share on other sites More sharing options...
btherl Posted June 19, 2007 Share Posted June 19, 2007 That sounds like a job for "GROUP BY". Here's an example: SELECT name, sum(score) as sum_score FROM scores GROUP BY name That query will give you the sum of the scores for each name. For your situation, do you want the sum of the scores? You need to decide what "apply a score to all of those on a global level" means. You could also take the average score. The basic approach is that you "GROUP BY" the columns which are unique, and then you use an "aggregate" on the columns which are NOT unique, which is score in your case. Aggregates include sum(), avg(), count(), and many more. Quote Link to comment https://forums.phpfreaks.com/topic/54839-union-in-mysql-help/#findComment-277436 Share on other sites More sharing options...
Delixe Posted June 19, 2007 Author Share Posted June 19, 2007 That sounds like a job for "GROUP BY". Here's an example: SELECT name, sum(score) as sum_score FROM scores GROUP BY name That query will give you the sum of the scores for each name. For your situation, do you want the sum of the scores? You need to decide what "apply a score to all of those on a global level" means. You could also take the average score. The basic approach is that you "GROUP BY" the columns which are unique, and then you use an "aggregate" on the columns which are NOT unique, which is score in your case. Aggregates include sum(), avg(), count(), and many more. I need all the data and the score. The score should be relative to all the data gained from all of the unions. I am thinking that I may not be able to UNION and will have to customize it without UNIONs in order to get all the data, scored, and unique. Quote Link to comment https://forums.phpfreaks.com/topic/54839-union-in-mysql-help/#findComment-277723 Share on other sites More sharing options...
btherl Posted June 20, 2007 Share Posted June 20, 2007 I'm not sure I understand what you need. Does your query produce something like this: id score 1 5 2 10 1 15 That is, you have the same id with different scores? And what kind of output do you expect instead of the output you are getting? Quote Link to comment https://forums.phpfreaks.com/topic/54839-union-in-mysql-help/#findComment-278186 Share on other sites More sharing options...
Delixe Posted June 20, 2007 Author Share Posted June 20, 2007 I'm not sure I understand what you need. Does your query produce something like this: id score 1 5 2 10 1 15 That is, you have the same id with different scores? And what kind of output do you expect instead of the output you are getting? No it needs get all the columns from the fmc_archive_dev table and have a column for score. What I have is a database full of content but I want to search the content on numerous keywords, so what I end up doing is UNION-ing a bunch of SELECTs together and score the items. As you know MySQL will provide a score based on the keyword match and will make another column for it however if you're matching against different keywords obviously you will get a different score per keyword. What ends up happening is a new column with different scores so I virtually get duplicate items that have the same columns but with different scores making them _not_ unique. What I want is to get all the columns with unique items, if any duplicate item results, I want to filter out the lowest scored item of the duplicate items. It is possible I know for their to be more than 2 duplicate items. I hope this makes sense. Quote Link to comment https://forums.phpfreaks.com/topic/54839-union-in-mysql-help/#findComment-278620 Share on other sites More sharing options...
btherl Posted June 21, 2007 Share Posted June 21, 2007 "I want to filter out the lowest scored item of the duplicate items" <- That's what I was looking for What you need is this, wrapped around your entire query: SELECT col1, col2, col3, max(score) AS max_score FROM (your query herhe) GROUP BY col1, col2, col3 Replace col1, col2, col3 with all the columns you want to be unique. They aren't specified in your query so I don't know what they are. You must specify them in the outer query. The column you leave out of the "group by" is score, because you're taking the maximum score. After that you can order by max_score desc, if you want the highest score first. Quote Link to comment https://forums.phpfreaks.com/topic/54839-union-in-mysql-help/#findComment-278989 Share on other sites More sharing options...
Delixe Posted June 22, 2007 Author Share Posted June 22, 2007 "I want to filter out the lowest scored item of the duplicate items" <- That's what I was looking for What you need is this, wrapped around your entire query: SELECT col1, col2, col3, max(score) AS max_score FROM (your query herhe) GROUP BY col1, col2, col3 Replace col1, col2, col3 with all the columns you want to be unique. They aren't specified in your query so I don't know what they are. You must specify them in the outer query. The column you leave out of the "group by" is score, because you're taking the maximum score. After that you can order by max_score desc, if you want the highest score first. I tried: SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX(MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('XSS')) as score FROM fmc_archive_dev WHERE ((MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('XSS')) AND (archived_on > 1181277356) AND (cat_id = 3)) GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on UNION SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX(MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('WAF')) as score FROM fmc_archive_dev WHERE ((MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('WAF')) AND (archived_on > 1181320556) AND (cat_id = 1 OR cat_id = 3 OR cat_id = 4 OR cat_id = 11 OR cat_id = 12 OR cat_id = 13 OR cat_id = 14 OR cat_id = 15 OR cat_id = 16 OR cat_id = 23 OR cat_id = 43 OR cat_id = 44)) GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on ORDER BY score DESC SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'XSS' ) ) AS score FROM fmc_archive_dev WHERE ( ( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'XSS' ) ) AND ( archived_on >1181277356 ) AND ( cat_id =3 ) ) GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on UNION SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'WAF' ) ) AS score FROM fmc_archive_dev WHERE ( ( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'WAF' ) ) AND ( archived_on >1181320556 ) AND ( cat_id =1 OR cat_id =3 OR cat_id =4 OR cat_id =11 OR cat_id =12 OR cat_id =13 OR cat_id =14 OR cat_id =15 OR cat_id =16 OR cat_id =23 OR cat_id =43 OR cat_id =44 ) ) GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on ORDER BY score DESC LIMIT 0 , 300 It gave me duplicates still: ROW: 128388 <p>I’ve been thinking about this for a long ... http://ha.ckers.org/blog/20070608/the-virtues-of-w... The Virtues of WAF Egress a:2:{i:0;s:3:"XSS";i:1;s:9:"Webappsec";} RSnake 1181324069 57 3 1181325343 8.03860855102539 And DUPLICATE ROW: 128388 <p>I’ve been thinking about this for a long ... http://ha.ckers.org/blog/20070608/the-virtues-of-w... The Virtues of WAF Egress a:2:{i:0;s:3:"XSS";i:1;s:9:"Webappsec";} RSnake 1181324069 57 3 1181325343 4.19421482086182 Notice they both have different scores: 8.03860855102539 and 4.19421482086182 I'd like the result to pick out the row with 8.03860855102539 as the score and get rid of the item with the score of 4.19421482086182 Quote Link to comment https://forums.phpfreaks.com/topic/54839-union-in-mysql-help/#findComment-279932 Share on other sites More sharing options...
btherl Posted June 22, 2007 Share Posted June 22, 2007 SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'XSS' ) ) AS score FROM fmc_archive_dev WHERE ( ( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'XSS' ) ) AND ( archived_on >1181277356 ) AND ( cat_id =3 ) ) GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on UNION SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'WAF' ) ) AS score FROM fmc_archive_dev WHERE ( ( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'WAF' ) ) AND ( archived_on >1181320556 ) AND ( cat_id =1 OR cat_id =3 OR cat_id =4 OR cat_id =11 OR cat_id =12 OR cat_id =13 OR cat_id =14 OR cat_id =15 OR cat_id =16 OR cat_id =23 OR cat_id =43 OR cat_id =44 ) ) GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on ORDER BY score DESC LIMIT 0 , 300 I've indented your code to show the problem. You are doing the group by on the individual queries before you union them. This will still give you duplicates. The group by should be around the entire query, including the union, not on each component of the union. Quote Link to comment https://forums.phpfreaks.com/topic/54839-union-in-mysql-help/#findComment-279941 Share on other sites More sharing options...
Delixe Posted June 22, 2007 Author Share Posted June 22, 2007 SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'XSS' ) ) AS score FROM fmc_archive_dev WHERE ( ( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'XSS' ) ) AND ( archived_on >1181277356 ) AND ( cat_id =3 ) ) GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on UNION SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'WAF' ) ) AS score FROM fmc_archive_dev WHERE ( ( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'WAF' ) ) AND ( archived_on >1181320556 ) AND ( cat_id =1 OR cat_id =3 OR cat_id =4 OR cat_id =11 OR cat_id =12 OR cat_id =13 OR cat_id =14 OR cat_id =15 OR cat_id =16 OR cat_id =23 OR cat_id =43 OR cat_id =44 ) ) GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on ORDER BY score DESC LIMIT 0 , 300 I've indented your code to show the problem. You are doing the group by on the individual queries before you union them. This will still give you duplicates. The group by should be around the entire query, including the union, not on each component of the union. How do I apply the GROUP BY to both UNIONed SELECTS, would you mind doing it for me, I tried. I got rid of the first GROUP BY but I then get the error message: #1140 - Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause Quote Link to comment https://forums.phpfreaks.com/topic/54839-union-in-mysql-help/#findComment-280392 Share on other sites More sharing options...
btherl Posted June 25, 2007 Share Posted June 25, 2007 Try this.. SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX(score) AS score FROM ( SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'XSS' ) AS score FROM fmc_archive_dev WHERE ( ( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'XSS' ) ) AND ( archived_on >1181277356 ) AND ( cat_id =3 ) ) UNION SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'WAF' ) AS score FROM fmc_archive_dev WHERE ( ( MATCH ( itm_desc, itm_title, itm_link, itm_author, itm_categories ) AGAINST ( 'WAF' ) ) AND ( archived_on >1181320556 ) AND ( cat_id =1 OR cat_id =3 OR cat_id =4 OR cat_id =11 OR cat_id =12 OR cat_id =13 OR cat_id =14 OR cat_id =15 OR cat_id =16 OR cat_id =23 OR cat_id =43 OR cat_id =44 ) ) ) AS the_subquery GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on ORDER BY score DESC LIMIT 0 , 300 "the_subquery" is your original query, duplicates and all. All wrapped up in a subquery. Then, wrapped around that is the select which does the grouping. This is the only select that has MAX() in it, and it's matched by the GROUP BY at the very end. The inner selects have no group by, and therefore are not allowed to use MAX(). So the process is: - Generate result set A (XSS) - Generate result set B (WAF) - Union result sets A and B. This union is the result of the subquery - Make results unique on every column except score. For score we take the maximum. This is achieved by the group by and the MAX() combined. Note that the columns appearing in the group by never use MAX(). Those columns are the "key", and the score column is the "data", when you think of it in key and data terms. The grouped by columns have every combination listed, and non-grouped columns get merged together into a single value. - Order by score desc. - Limit to no more than 300 results. Quote Link to comment https://forums.phpfreaks.com/topic/54839-union-in-mysql-help/#findComment-281644 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.