Jump to content

Slow Query


JonnoTheDev

Recommended Posts

OK, so optimizing other peoples queries is not my strong point so here goes.

Looking at a social bookmarking manager system that has a list of a few hundred sites. It can record which sites you have posted bookmarks to. Sites may run various scripts, so there may be 100 sites using the same script. The users for each site are also recorded, username, password, etc. The script records each bookmark that is posted on each site by each user. The problematic part is that there is a section that takes a bookmark url, a script number and gives you the sites from that script where the bookmark has not been posted to. This query is taking about 16 seconds to run, Too Slow. Here is the simplified db structure, query, followed by the results of an EXPLAIN.

 

site

======

siteId

scriptId

url

 

user

======

userId

siteId

name

 

submission

==========

subId

siteId

userId

bookmarkUrl

 

SELECT s.siteId 
FROM 
site s 

INNER JOIN user u ON(s.siteId=u.siteId) 
LEFT JOIN submission su ON(s.siteId=su.siteId AND u.userId=su.userId AND su.bookmarkUrl ='http://www.xyz.com') 

WHERE s.scriptId='4' AND ISNULL(su.subId) GROUP BY s.siteId

 

1  SIMPLE s  ref  PRIMARY,scriptId   scriptId  1  const  219  Using where; Using temporary; Using filesort

1 SIMPLE u ref siteId siteId 4 socialBookmarking.s.siteId 389

1 SIMPLE su ref userId,bookmarkUrl,siteId userId 4 socialBookmarking.u.userId 1 Using where; Not exists

 

 

Link to comment
Share on other sites

I have tried using a temporary table but still results in a slow returning resultset

 

CREATE TEMPORARY TABLE siteUsers 
SELECT s.siteId, userId
FROM 
site s 
INNER JOIN user u ON(s.siteId=u.siteId) 
WHERE s.scriptId='4';

DELETE FROM siteUsers WHERE userId IN(SELECT su.userId FROM submission su WHERE su.bookmarkUrl ='http://www.xyz.com' AND su.scriptId='4');

SELECT siteId FROM siteUsers GROUP BY siteId;

 

Link to comment
Share on other sites

There are plenty of indexes in use. Look again. scriptId, siteId, userId

 

1     SIMPLE   s     ref     PRIMARY,scriptId               scriptId     1     const              219     Using where; Using temporary; Using filesort
1    SIMPLE    u     ref     siteId                                 siteId       4    s.siteId    389    
1    SIMPLE    su    ref    userId,bookmarkUrl,siteId    userId       4    u.userId    1    Using where; Not exists

 

Rows do not get deleted from this system

Link to comment
Share on other sites

Is there an index on bookmarkUrl in the submission table? If there isn't it will have to do a full table scan.

I don't think the index with the three fields userId,bookmarkUrl,siteId is necessary. Just have one for bookmarkUrl

 

How much faster is the query if you eliminate the join to the user table? 

 

Have you considered denormalizing the table? There's so little data in the site table that you could store this in the submission table

and do away with the need for a join between the site and submission tables.

Link to comment
Share on other sites

See, there is still an issue here. Even though removing the GROUP BY claus results in a faster query process time (reported by phpMyAdmin), the actual results take the same amount of time to be returned to the client in both queries. So the query with the GROUP BY removed that supposedly took 0.1318 sec still takes around the same time as the first query to display any results. So that is fishy!

 

Lets try using the actual MySQL command prompt on the server.

 

The query without the GROUP BY claus:

44317 rows in set (15.48 sec)

The query with the GROUP by claus:

77 rows in set (16.34 sec)

 

Looks like phpMyAdmin gives incorrect results in this case. So fundamentally there is little difference between the 2.

 

Link to comment
Share on other sites

I can see that. site.siteId is the PK, user.siteId has an INDEX. If I run a query to join site to user on siteId where scriptId=x, the indexed used is siteId. In the problematic query the indexed is not used, so therefore it is the rest of the query.

Link to comment
Share on other sites

I mean do you need the users table in your query? Not enough to check whether a submission has a user?

 

If you need users datas and dont specify a user or users set in WHERE clause your query is ok i think.

Link to comment
Share on other sites

I need users as I need to know the users from a particular scriptId and also the users who have submitted the bookmark already to sites from that script.

 

Here is a modified query that uses all indexes. Still takes the same time to return results. Around 15 seconds.

SELECT s.siteId
FROM
site s

INNER JOIN user u ON(s.siteId=u.siteId AND u.scriptId='4')
LEFT JOIN submission su ON(s.siteId=su.siteId AND u.userId=su.userId AND su.bookmarkUrl ='http://www.xyz.co.uk') 

WHERE ISNULL(su.subId) GROUP BY s.siteId

 

So, from the 44,673 results I need all unique siteIds. That is where the GROUP BY comes in. This should result in about 77 records.

Even if I use DISTINCT as opposed to GROUP BY, the query is still slow.

 

[attachment deleted by admin]

Link to comment
Share on other sites

Using GROUP BY

77 total, Query took 16.4740 sec

 

Without GROUP BY

44,317 total, Query took 0.1318 sec

Forget about getting the rows back to php -- that's a different issue entirely.  We need to focus on getting back the distinct list.

 

And I have a hard time believing that it takes the same time to send the rows.

Link to comment
Share on other sites

I'm confused, reading over this thread -- don't you start with a bookmark?  Why the left join?

 

What it supposed to be happening is that the list of sites that a bookmark can be posted to are returned. For example if I have 5 sites of lets say scriptId 4, each site has 2 user accounts (10 total), and 1 bookmark to post. I want to know what users have posted this bookmark to what site and only return the sites that haven't had the bookmark posted by all user accounts.

 

So out of 10 users, if 5 of them (1 from each site) have posted the bookmark. The query should still return all 5 sites as there is still 1 user account remaining for each site that the bookmark has not been posted to. If 1 of the 5 remaining accounts posts the bookmark then the query should return 4 sites as 1 site has used up all its user accounts.

 

Using the following as opposed to 1 query still takes time to return the records. These queries are simple IMO.

 

# OBTAIN ALL USERS FROM EACH SITE WHERE SCRIPT ID = X, CREATE TEMPORARY TABLE
# -------------------------------
CREATE TEMPORARY TABLE siteUsers
SELECT s.siteId, u.userId
FROM
site s
INNER JOIN user u ON(s.siteId=u.siteId)
WHERE s.scriptId='4';

# OBTAIN ALL USER ACCOUNTS FROM EACH SITE WHERE SCRIPT ID = X THAT HAVE SUBMITTED BOOKMARK X. DELETE THE RESULTS FROM TEMPORARY TABLE
# -------------------------------
DELETE FROM siteUsers WHERE userId IN(SELECT su.userId FROM submission su WHERE su.bookmarkUrl ='http://www.xyz.com' AND su.scriptId='4');

# OBTAIN A DISTINCT LIST OF SITES FROM THE REMAINING RESULTS IN THE TEMPORARY TABLE. THE RESULTS ARE THE REMAINING SITES WHERE USER ACCOUNTS ARE 
# AVAILABLE TO POST BOOKMARK X
# -------------------------------
SELECT siteId FROM siteUsers GROUP BY siteId;

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.