Jump to content


Photo

Slow Inner Join


  • Please log in to reply
10 replies to this topic

#1 neoform

neoform
  • Members
  • PipPipPip
  • Advanced Member
  • 241 posts
  • LocationMontreal

Posted 15 July 2006 - 05:19 PM

I wrote this (obviously badly designed) query that matches IP addresses to their associated country (located in my IP2Location table)

"SELECT ip, user_id, used_on, country_code2, country_code3, country_name FROM stats_ips INNER JOIN ip_2_country ON ip_to >= ip AND ip_from <= ip ORDER BY ".$_order_by." ".$_order_direction." LIMIT ".(($_page - 1) * PER_PAGE_IPS).", ".PER_PAGE_IPS

this is stats_ips:
ip (INT UNSIGNED - Primary)
user_id (MEDIUMINT - Indexed)
used_on (DATETIME - Indexed)

this is ip_2_country:
ip_from (INT UNSIGNED - Primary)
ip_to (INT UNSIGNED - Primary)
country_name (CHAR 50 - Indexed)
country_code2 (CHAR 2)
country_code3 (CHAR 3)

One of the reasons i bother with the joining then is becuse i want to be able to order by any of the fields: ip, user_id, used_on, country_name

this query worked ok when there was a few rows in stats_ips, but as soon as i added any normal number of rows to it (100,000) it became insanely slow (1000+ seconds).

my guess is it's an index problem, (i've always been terrible with index optmizing).
Newsique.com Social News Network

#2 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 15 July 2006 - 05:56 PM

Well, it doesn't help that your JOIN condition has inequalities... post the interpolated query, and the EXPLAIN output.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#3 neoform

neoform
  • Members
  • PipPipPip
  • Advanced Member
  • 241 posts
  • LocationMontreal

Posted 15 July 2006 - 06:25 PM

Well, as much as i'd like to have something like "stats_ips.ip = ip_2_location.ip", that would mean i'd have a record for every possible IP address..  that's like what..  4.2 Billion.. ? :P I'm pretty much stuck using ranges.

here's an example of it in use..

SELECT ip, user_id, used_on, country_code2, country_code3, country_name FROM stats_ips INNER JOIN ip_2_country ON ip_from <= ip AND ip_to >= ip ORDER BY used_on DESC LIMIT 0, 20

The output is simply supposed to be a list of ip addresses from the stats_ips table along with related info, like who used it, what country the ip is from and when the ip was last used.

i'm looking to make this out be able to be sorted by all 4 fields, which it currently does, but mysql crumbles if i add any amount of ips to the stats_ips table.

(btw, my ip_2_location table has about 75,000 rows)..
Newsique.com Social News Network

#4 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 15 July 2006 - 08:54 PM

Still, post the EXPLAIN output.
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#5 neoform

neoform
  • Members
  • PipPipPip
  • Advanced Member
  • 241 posts
  • LocationMontreal

Posted 15 July 2006 - 09:16 PM

my bad, i wasn't even aware there was an explain command..  :(

+----+-------------+--------------+------+---------------+-----+---------+-----+-------+------------------------------------------------+
| id | select_type | table        | type | possible_keys | key | key_len | ref | rows  | Extra                                          |
+----+-------------+--------------+------+---------------+-----+---------+-----+-------+------------------------------------------------+
|  1 | SIMPLE      | stats_ips    | ALL  | PRIMARY,ip    | NULL| NULL    | NULL|    6 | Using temporary; Using filesort                |
|  1 | SIMPLE      | ip_2_country | ALL  | PRIMARY      | NULL| NULL    | NULL| 73952 | Range checked for each record (index map: 0x1) |
+----+-------------+--------------+------+---------------+-----+---------+-----+-------+------------------------------------------------+
Newsique.com Social News Network

#6 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 16 July 2006 - 06:16 AM

Well, type ALL is bad... no key used is also bad; as is the filesort.  BTW, depending on what your "range" is, you can probably get away with wildcards, especially for subnet ranges (which is what I assume that you're using this for).
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.

#7 neoform

neoform
  • Members
  • PipPipPip
  • Advanced Member
  • 241 posts
  • LocationMontreal

Posted 16 July 2006 - 02:27 PM

i don't suppose you could throw me an example.. ?  i've been stuck on this query for 3 days now.. :(
Newsique.com Social News Network

#8 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 16 July 2006 - 04:16 PM

If you're using MYSQL 4.1 or higher the following should be faster.

This uses a SUBSELECT to do the initial ordering and limit. The ordered column has to be in the stats_ips table however. Meaning ordering on country_name is out. Although you can modify the query to still order by a stats_ips column and then a ip_2_country column.
SELECT
ip, used_on, user_id, country_code
FROM
    (
    SELECT ip, used_on, user_id
    FROM stats_ips
    ORDER BY
    used_on
    LIMIT 0, 20
    )
   AS
stats_ips
INNER JOIN
ip_2_country
ON
ip_from <= ip AND ip_to >= ip
ORDER BY
used_on
DESC;
You'll just have to limit the number of ips shown each time to keep the speed.

#9 neoform

neoform
  • Members
  • PipPipPip
  • Advanced Member
  • 241 posts
  • LocationMontreal

Posted 16 July 2006 - 04:57 PM

Thanks :) Works great.
Newsique.com Social News Network

#10 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 16 July 2006 - 05:09 PM

You probably caught this, but the order by in the subselect is missing the DESC in case you missed it.

#11 fenway

fenway
  • Staff Alumni
  • MySQL Si-Fu / PHP Resident Alien
  • 16,199 posts
  • LocationToronto, ON

Posted 16 July 2006 - 05:35 PM

That's definitely an improvement... again, what type of IP range is being used here?
Seriously... if people don't start reading this before posting, I'm going to consider not answering at all.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users