Jump to content

UNION in MySQL Help


Delixe

Recommended Posts

UNION is MySQL is not supposed to provide duplicate rows however:

 

SELECT *, (MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('XSS')) as score FROM fmc_archive_dev WHERE ((MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('XSS')) AND (archived_on > 1181277356) AND (cat_id = 3)) UNION SELECT *, (MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('WAF')) as score FROM fmc_archive_dev WHERE ((MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('WAF')) AND (archived_on > 1181320556) AND (cat_id = 1 OR cat_id = 3 OR cat_id = 4 OR cat_id = 11 OR cat_id = 12 OR cat_id = 13 OR cat_id = 14 OR cat_id = 15 OR cat_id = 16 OR cat_id = 23 OR cat_id = 43 OR cat_id = 44))

 

Produces duplicate columns, 3 rows in total, 2 with the same id. I want unique rows, what am I doing wrong?

 

Link to comment
Share on other sites

Is every column identical for those rows?  Keep in mind that null, empty string and 0 are all distinct values as far as union is concerned (though null is considered the same as null).

 

And please reformat your query so it's readable :)

Link to comment
Share on other sites

Is every column identical for those rows?  Keep in mind that null, empty string and 0 are all distinct values as far as union is concerned (though null is considered the same as null).

 

And please reformat your query so it's readable :)

 

SELECT * , (

MATCH (
itm_desc, itm_title, itm_link, itm_author, itm_categories
)
AGAINST (
'XSS'
)
) AS score
FROM fmc_archive_dev
WHERE (
(

MATCH (
itm_desc, itm_title, itm_link, itm_author, itm_categories
)
AGAINST (
'XSS'
)
)
AND (
archived_on >1181277356
)
AND (
cat_id =3
)
)
UNION SELECT * , (

MATCH (
itm_desc, itm_title, itm_link, itm_author, itm_categories
)
AGAINST (
'WAF'
)
) AS score
FROM fmc_archive_dev
WHERE (
(

MATCH (
itm_desc, itm_title, itm_link, itm_author, itm_categories
)
AGAINST (
'WAF'
)
)
AND (
archived_on >1181320556
)
AND (
cat_id =1
OR cat_id =3
OR cat_id =4
OR cat_id =11
OR cat_id =12
OR cat_id =13
OR cat_id =14
OR cat_id =15
OR cat_id =16
OR cat_id =23
OR cat_id =43
OR cat_id =44
)
)

 

btherl: Thank you so much, if you take a look in my query you'll notice that different keywords have a score, the rows have different scores which make them unique. Is there anyway to apply a score to all of those on a global level so I do not get duplicate rows?

Link to comment
Share on other sites

That sounds like a job for "GROUP BY".  Here's an example:

 

SELECT name, sum(score) as sum_score FROM scores GROUP BY name

 

That query will give you the sum of the scores for each name.  For your situation, do you want the sum of the scores?  You need to decide what "apply a score to all of those on a global level" means.  You could also take the average score.

 

The basic approach is that you "GROUP BY" the columns which are unique, and then you use an "aggregate" on the columns which are NOT unique, which is score in your case.  Aggregates include sum(), avg(), count(), and many more.

 

 

Link to comment
Share on other sites

That sounds like a job for "GROUP BY".  Here's an example:

 

SELECT name, sum(score) as sum_score FROM scores GROUP BY name

 

That query will give you the sum of the scores for each name.  For your situation, do you want the sum of the scores?  You need to decide what "apply a score to all of those on a global level" means.  You could also take the average score.

 

The basic approach is that you "GROUP BY" the columns which are unique, and then you use an "aggregate" on the columns which are NOT unique, which is score in your case.  Aggregates include sum(), avg(), count(), and many more.

 

 

 

I need all the data and the score. The score should be relative to all the data gained from all of the unions. I am thinking that I may not be able to UNION and will have to customize it without UNIONs in order to get all the data, scored, and unique.

Link to comment
Share on other sites

I'm not sure I understand what you need.

 

Does your query produce something like this:

 

id  score

1      5

2      10

1      15

 

That is, you have the same id with different scores?  And what kind of output do you expect instead of the output you are getting?

Link to comment
Share on other sites

I'm not sure I understand what you need.

 

Does your query produce something like this:

 

id  score

1      5

2      10

1      15

 

That is, you have the same id with different scores?  And what kind of output do you expect instead of the output you are getting?

 

No it needs get all the columns from the fmc_archive_dev table and have a column for score.

 

What I have is a database full of content but I want to search the content on numerous keywords, so what I end up doing is UNION-ing a bunch of SELECTs together and score the items. As you know MySQL will provide a score based on the keyword match and will make another column for it however if you're matching against different keywords obviously you will get a different score per keyword. What ends up happening is a new column with different scores so I virtually get duplicate items that have the same columns but with different scores making them _not_ unique.

 

What I want is to get all the columns with unique items, if any duplicate item results, I want to filter out the lowest scored item of the duplicate items. It is possible I know for their to be more than 2 duplicate items.

 

I hope this makes sense.

Link to comment
Share on other sites

"I want to filter out the lowest scored item of the duplicate items"  <- That's what I was looking for :)

 

What you need is this, wrapped around your entire query:

 

SELECT col1, col2, col3, max(score) AS max_score
FROM (your query herhe)
GROUP BY col1, col2, col3

 

Replace col1, col2, col3 with all the columns you want to be unique.  They aren't specified in your query so I don't know what they are.  You must specify them in the outer query.

 

The column you leave out of the "group by" is score, because you're taking the maximum score.  After that you can order by max_score desc, if you want the highest score first.

Link to comment
Share on other sites

"I want to filter out the lowest scored item of the duplicate items"  <- That's what I was looking for :)

 

What you need is this, wrapped around your entire query:

 

SELECT col1, col2, col3, max(score) AS max_score
FROM (your query herhe)
GROUP BY col1, col2, col3

 

Replace col1, col2, col3 with all the columns you want to be unique.  They aren't specified in your query so I don't know what they are.  You must specify them in the outer query.

 

The column you leave out of the "group by" is score, because you're taking the maximum score.  After that you can order by max_score desc, if you want the highest score first.

 

I tried:

 

SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX(MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('XSS')) as score FROM fmc_archive_dev WHERE ((MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('XSS')) AND (archived_on > 1181277356) AND (cat_id = 3)) GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on UNION SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX(MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('WAF')) as score FROM fmc_archive_dev WHERE ((MATCH(itm_desc, itm_title, itm_link, itm_author, itm_categories) AGAINST('WAF')) AND (archived_on > 1181320556) AND (cat_id = 1 OR cat_id = 3 OR cat_id = 4 OR cat_id = 11 OR cat_id = 12 OR cat_id = 13 OR cat_id = 14 OR cat_id = 15 OR cat_id = 16 OR cat_id = 23 OR cat_id = 43 OR cat_id = 44)) GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on ORDER BY score DESC

 

SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX(
MATCH (
itm_desc, itm_title, itm_link, itm_author, itm_categories
)
AGAINST (
'XSS'
) ) AS score
FROM fmc_archive_dev
WHERE (
(

MATCH (
itm_desc, itm_title, itm_link, itm_author, itm_categories
)
AGAINST (
'XSS'
)
)
AND (
archived_on >1181277356
)
AND (
cat_id =3
)
)
GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on
UNION SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX(
MATCH (
itm_desc, itm_title, itm_link, itm_author, itm_categories
)
AGAINST (
'WAF'
) ) AS score
FROM fmc_archive_dev
WHERE (
(

MATCH (
itm_desc, itm_title, itm_link, itm_author, itm_categories
)
AGAINST (
'WAF'
)
)
AND (
archived_on >1181320556
)
AND (
cat_id =1
OR cat_id =3
OR cat_id =4
OR cat_id =11
OR cat_id =12
OR cat_id =13
OR cat_id =14
OR cat_id =15
OR cat_id =16
OR cat_id =23
OR cat_id =43
OR cat_id =44
)
)
GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on
ORDER BY score DESC
LIMIT 0 , 300

 

It gave me duplicates still:

ROW:

128388  <p>I’ve been thinking about this for a long ...  http://ha.ckers.org/blog/20070608/the-virtues-of-w...  The Virtues of WAF Egress  a:2:{i:0;s:3:"XSS";i:1;s:9:"Webappsec";}  RSnake  1181324069  57  3  1181325343  8.03860855102539

 

And DUPLICATE ROW:

128388  <p>I’ve been thinking about this for a long ...  http://ha.ckers.org/blog/20070608/the-virtues-of-w...  The Virtues of WAF Egress  a:2:{i:0;s:3:"XSS";i:1;s:9:"Webappsec";}  RSnake  1181324069  57  3  1181325343  4.19421482086182

 

Notice they both have different scores: 8.03860855102539 and 4.19421482086182

I'd like the result to pick out the row with 8.03860855102539 as the score and get rid of the item with the score of 4.19421482086182

Link to comment
Share on other sites

SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX(
  MATCH (
    itm_desc, itm_title, itm_link, itm_author, itm_categories
  )
  AGAINST (
  'XSS'
  )
) AS score
FROM fmc_archive_dev
WHERE (
  (
    MATCH (
      itm_desc, itm_title, itm_link, itm_author, itm_categories
    )
    AGAINST (
      'XSS'
    )
  )
  AND (
    archived_on >1181277356
  )
  AND (
  cat_id =3
  )
)
GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on
UNION SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX(
  MATCH (
    itm_desc, itm_title, itm_link, itm_author, itm_categories
  )
  AGAINST (
    'WAF'
  )
) AS score
FROM fmc_archive_dev
WHERE (
  (
    MATCH (
      itm_desc, itm_title, itm_link, itm_author, itm_categories
    )
    AGAINST (
      'WAF'
    )
  )
  AND (
    archived_on >1181320556
  )
  AND (
    cat_id =1
    OR cat_id =3
    OR cat_id =4
    OR cat_id =11
    OR cat_id =12
    OR cat_id =13
    OR cat_id =14
    OR cat_id =15
    OR cat_id =16
    OR cat_id =23
    OR cat_id =43
    OR cat_id =44
  )
)
GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on
ORDER BY score DESC
LIMIT 0 , 300

 

I've indented your code to show the problem.  You are doing the group by on the individual queries before you union them.  This will still give you duplicates.  The group by should be around the entire query, including the union, not on each component of the union.

Link to comment
Share on other sites

SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX(
  MATCH (
    itm_desc, itm_title, itm_link, itm_author, itm_categories
  )
  AGAINST (
  'XSS'
  )
) AS score
FROM fmc_archive_dev
WHERE (
  (
    MATCH (
      itm_desc, itm_title, itm_link, itm_author, itm_categories
    )
    AGAINST (
      'XSS'
    )
  )
  AND (
    archived_on >1181277356
  )
  AND (
  cat_id =3
  )
)
GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on
UNION SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX(
  MATCH (
    itm_desc, itm_title, itm_link, itm_author, itm_categories
  )
  AGAINST (
    'WAF'
  )
) AS score
FROM fmc_archive_dev
WHERE (
  (
    MATCH (
      itm_desc, itm_title, itm_link, itm_author, itm_categories
    )
    AGAINST (
      'WAF'
    )
  )
  AND (
    archived_on >1181320556
  )
  AND (
    cat_id =1
    OR cat_id =3
    OR cat_id =4
    OR cat_id =11
    OR cat_id =12
    OR cat_id =13
    OR cat_id =14
    OR cat_id =15
    OR cat_id =16
    OR cat_id =23
    OR cat_id =43
    OR cat_id =44
  )
)
GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on
ORDER BY score DESC
LIMIT 0 , 300

 

I've indented your code to show the problem.  You are doing the group by on the individual queries before you union them.  This will still give you duplicates.  The group by should be around the entire query, including the union, not on each component of the union.

 

How do I apply the GROUP BY to both UNIONed SELECTS, would you mind doing it for me, I tried. I got rid of the first GROUP BY but I then get the error message:

#1140 - Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause

Link to comment
Share on other sites

Try this..

 

SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on, MAX(score) AS score FROM
(
  SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on,
    MATCH (
      itm_desc, itm_title, itm_link, itm_author, itm_categories
    )
    AGAINST (
      'XSS'
    )
  AS score
  FROM fmc_archive_dev
  WHERE (
    (
      MATCH (
        itm_desc, itm_title, itm_link, itm_author, itm_categories
      )
      AGAINST (
        'XSS'
      )
    )
    AND (
      archived_on >1181277356
    )
    AND (
    cat_id =3
    )
  )
  UNION SELECT id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on,
    MATCH (
      itm_desc, itm_title, itm_link, itm_author, itm_categories
    )
    AGAINST (
      'WAF'
    )
  AS score
  FROM fmc_archive_dev
  WHERE (
    (
      MATCH (
        itm_desc, itm_title, itm_link, itm_author, itm_categories
      )
      AGAINST (
        'WAF'
      )
    )
    AND (
      archived_on >1181320556
    )
    AND (
      cat_id =1
      OR cat_id =3
      OR cat_id =4
      OR cat_id =11
      OR cat_id =12
      OR cat_id =13
      OR cat_id =14
      OR cat_id =15
      OR cat_id =16
      OR cat_id =23
      OR cat_id =43
      OR cat_id =44
    )
  )
) AS the_subquery
GROUP BY id, itm_desc, itm_link, itm_title, itm_categories, itm_author, itm_date, feed_id, cat_id, archived_on
ORDER BY score DESC
LIMIT 0 , 300

 

"the_subquery" is your original query, duplicates and all.  All wrapped up in a subquery.  Then, wrapped around that is the select which does the grouping.  This is the only select that has MAX() in it, and it's matched by the GROUP BY at the very end.  The inner selects have no group by, and therefore are not allowed to use MAX().

 

So the process is:

 

- Generate result set A (XSS)

- Generate result set B (WAF)

- Union result sets A and B.  This union is the result of the subquery

- Make results unique on every column except score.  For score we take the maximum.  This is achieved by the group by and the MAX() combined.  Note that the columns appearing in the group by never use MAX().  Those columns are the "key", and the score column is the "data", when you think of it in key and data terms.  The grouped by columns have every combination listed, and non-grouped columns get merged together into a single value.

- Order by score desc.

- Limit to no more than 300 results.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.