Jump to content

[SOLVED] Excluding results based on another table


Cydewinder

Recommended Posts

I've run into a problem I've never encountered before, and it's got me totally stumped. I've never really asked for help on a forum before, but I'm really desperate this time. Let's hope I do this right.

 

MySQL server version: I see two numbers in PHPMyAdmin, hopefully one of these is the right one.

Server version: 5.0.24a-standard-log

MySQL client version: 5.0.16

 

Raw MySQL code block as it currently stands (I've played a lot with it)

select id from adoptables where not exists ( select * from rarecandylimiter where rarecandylimiter.ip = inet_aton('12.12.12.12') and rarecandylimiter.pokeid = adoptables.id ) order by credits desc, id asc limit 1

 

I also tried it in this form

 

select distinct a.id from adoptables a left join rarecandylimiter r on a.id = r.pokeid where r.ip != '121.94.235.68' limit 5
order by a.credits desc

 

Create tables:

CREATE TABLE `adoptables` (\n  `id` int( unsigned NOT NULL auto_increment,\n  `image` varchar(60) NOT NULL,\n  `name` varchar(60) NOT NULL,\n  `level` int( unsigned NOT NULL default '1',\n  `code` int( NOT NULL,\n  `medals` varchar(255) NOT NULL,\n  `hatched` tinyint(1) NOT NULL default '0',\n  `credits` int(10) NOT NULL default '10',\n  PRIMARY KEY  (`id`),\n  KEY `name` (`name`),\n  KEY `credits` (`credits`)\n) ENGINE=MyISAM AUTO_INCREMENT=98742 DEFAULT CHARSET=latin1

CREATE TABLE `rarecandylimiter` (\n  `id` int( NOT NULL auto_increment,\n  `pokeid` int(10) NOT NULL,\n  `ip` int(12) NOT NULL,\n  PRIMARY KEY  (`id`),\n  KEY `pokeid` (`pokeid`),\n  KEY `ip` (`ip`)\n) ENGINE=MyISAM AUTO_INCREMENT=1872 DEFAULT CHARSET=latin1

 

Explain output's for the two previously mentioned queries

id  select_type  table  type  possible_keys  key  key_len  ref  rows  Extra  
1 PRIMARY adoptables ALL NULL NULL NULL NULL 98741 Using where; Using filesort 
2 DEPENDENT SUBQUERY rarecandylimiter ref pokeid,ip pokeid 4 pokeplushies.adoptables.id 1 Using where

 

id  select_type  table  type  possible_keys  key  key_len  ref  rows  Extra  
1 SIMPLE r range pokeid,ip ip 4 NULL 1887 Using where; Using temporary 
1 SIMPLE a eq_ref PRIMARY PRIMARY 4 pokeplushies.r.pokeid 1 Using where; Using index 

 

And now, the problem.

 

Basically I have two tables. The first, adoptables, is a list of adoptables that people have adopted from my site. People can click on these adoptables, but to prevent clickspam only one click per ip per adoptable counts each day. That data is stored in the table rarecandylimiter. Rarecandylimiter is cleared every night.

 

What I want this query to do is show me the row from adoptables, with the highest value in the credits field, (and with a secondary sort for the lowest id field) that my IP address hasn't clicked on yet today.

 

The first query I'm using takes almost two seconds to execute, mainly because it's searching through all 100K rows in adoptables. I couldn't fix it, so I went to the second query which is faster, but is searching through every row in rarecandylimiter. When I posted this that's not too bad, only 1887, but over the course of a day that number grows to nearly 60,000. Which ofcourse isn't ideal. Or maybe it is, I'm not entirely sure I understand that explain output.

 

I want to know if there's anything I can do to speed up these queries. This is a problem I've never tried to solve before (excluding results based on another table), and I've been googling any terms I can think of to find a solution. That's actually where I found the first query, I'd never seen NOT EXISTS before. The second query I wrote myself.

 

My MySQL experience is pretty basic, I only really do simple things, selecting, updating, deleting. I understand joins somewhat, but indexing is relatively new to me.

 

I'm storing the IP address as an integer because another of my google searches suggested that it would help it index better. It seemed like a good idea, but didn't help. :(

 

Any help is greatly appreciated.

Link to comment
Share on other sites

Let's talk about the second query with the left join... that limit is in a very strange place.  Let's also drop the distinct/order by for a second.  That != is going to cause problems.  Also, you said you're storing ip as an int, yet you're querying it as a string?  What you need is a left join with the ip as an equality in the on clause, then IS NULL in the where clause.  But I need to clarify the earlier points first.

Link to comment
Share on other sites

Ah yes, sorry. I've been playing with this for the last four or five hours so things have moved around a lot. That limit should  be limit 1 and at the end of the query.

 

The ip change was the most recent change I did, hence the string remnant.

 

Could you possibly help me understand why != is harder to process than =. I was under the impression that limit 1 would make it stop as soon as it found a row that satisfied the !=, but I havn't been observing that behaviour.

 

I tried a simple query. (Select * from rarecandylimiter where ip = 4253325 limit 1) and that was very fast using the correct index, but when I changed it to != it reverted to searching every row. I don't understand this behaviour, if you could help me understand it I might understand the solution better.

 

Also could you clarify the solution you suggested? As I understand it you're suggesting something like this:

 

select a.id from adoptables a left join rarecandylimiter r on a.id = r.pokeid, r.ip= inet_aton('121.94.235.68') where [something to do with IS NULL] limit 1

 

Is that on the right track?

 

Thank you for your help.

Link to comment
Share on other sites

You can't use an index to find where a value *isn't*... just because you have an index of values present, you still need to check each one to see if it doesn't match your value .... making the index useless -- which is why it's doing a full table scan.

 

I meant something like this:

 

select a.id from adoptables a left join rarecandylimiter r on ( a.id = r.pokeid AND r.ip= inet_aton('121.94.235.68') ) where r.pokeid IS NULL

 

That is, you find all rows that match your ip -- then the ones that don't will be NULLed out by the left join -- making them easy to find with IS NULL.  Obviously, once this works, you can add whatever order/limit you desire.

Link to comment
Share on other sites

Ok, that works *a lot* better, thank you very much for that.

 

I'm happy with the speed now, but there's still one thing it'd be nice to optimize out if I could.

 

Using this query (no order)

SELECT a.id
FROM adoptables a
LEFT JOIN rarecandylimiter r ON ( a.id = r.pokeid
AND r.ip = inet_aton( '121.94.235.68' ) ) 
WHERE r.pokeid IS NULL 
LIMIT 1 

 

It gives me Query took 0.0005 sec and uses all the expected indexes.

 

Once I add the order by statement:

 

select a.id from adoptables a left join rarecandylimiter r on ( a.id = r.pokeid AND r.ip= inet_aton('121.94.235.68') ) where r.pokeid IS NULL order by a.credits desc, a.id asc limit 1

 

It takes 0.1292 sec. Here's the explain output:

 

id  select_type  table  type  possible_keys  key  key_len  ref  rows  Extra  
1 SIMPLE a ALL NULL NULL NULL NULL 99090 Using filesort 
1 SIMPLE r ref pokeid,ip pokeid 4 pokeplushies.a.id 1 Using where; Not exists 

 

So it's still filesorting on a. It's taking an acceptable amount of time, but may not scale very well as the database gets bigger. As I've previously said, I don't really understand indexing, it's pretty new to me, but I've got an index on credits and id, shouldnt it be using one of those for the ordering by? Or am I totally misunderstanding indexing?

 

Sorry if that's a stupid question, thanks for your help so far.

 

Edit: I just found out you can add an index that uses multiple columns. So I tried making an index for id *and* credits. Then the explain gives me this output (added "using index") with no noticeable speed increase:

 

id  select_type  table  type  possible_keys  key  key_len  ref  rows  Extra  
1 SIMPLE a index NULL credits 8 NULL 99096 Using index; Using filesort 
1 SIMPLE r ref pokeid,ip pokeid 4 pokeplushies.a.id 1 Using where; Not exists 

 

This leads me to think that indexing doesn't help sorting at all.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.