JonnoTheDev Posted August 5, 2010 Share Posted August 5, 2010 OK, so optimizing other peoples queries is not my strong point so here goes. Looking at a social bookmarking manager system that has a list of a few hundred sites. It can record which sites you have posted bookmarks to. Sites may run various scripts, so there may be 100 sites using the same script. The users for each site are also recorded, username, password, etc. The script records each bookmark that is posted on each site by each user. The problematic part is that there is a section that takes a bookmark url, a script number and gives you the sites from that script where the bookmark has not been posted to. This query is taking about 16 seconds to run, Too Slow. Here is the simplified db structure, query, followed by the results of an EXPLAIN. site ====== siteId scriptId url user ====== userId siteId name submission ========== subId siteId userId bookmarkUrl SELECT s.siteId FROM site s INNER JOIN user u ON(s.siteId=u.siteId) LEFT JOIN submission su ON(s.siteId=su.siteId AND u.userId=su.userId AND su.bookmarkUrl ='http://www.xyz.com') WHERE s.scriptId='4' AND ISNULL(su.subId) GROUP BY s.siteId 1 SIMPLE s ref PRIMARY,scriptId scriptId 1 const 219 Using where; Using temporary; Using filesort 1 SIMPLE u ref siteId siteId 4 socialBookmarking.s.siteId 389 1 SIMPLE su ref userId,bookmarkUrl,siteId userId 4 socialBookmarking.u.userId 1 Using where; Not exists Quote Link to comment Share on other sites More sharing options...
JonnoTheDev Posted August 5, 2010 Author Share Posted August 5, 2010 I have tried using a temporary table but still results in a slow returning resultset CREATE TEMPORARY TABLE siteUsers SELECT s.siteId, userId FROM site s INNER JOIN user u ON(s.siteId=u.siteId) WHERE s.scriptId='4'; DELETE FROM siteUsers WHERE userId IN(SELECT su.userId FROM submission su WHERE su.bookmarkUrl ='http://www.xyz.com' AND su.scriptId='4'); SELECT siteId FROM siteUsers GROUP BY siteId; Quote Link to comment Share on other sites More sharing options...
bh Posted August 5, 2010 Share Posted August 5, 2010 Hi, I dont see "Using index" in your Explain datas. So, indexes may helps a bit. Another thing is optimization, try to optimize your table. It helps you if your table has had a lots of deleted rows. Quote Link to comment Share on other sites More sharing options...
JonnoTheDev Posted August 5, 2010 Author Share Posted August 5, 2010 There are plenty of indexes in use. Look again. scriptId, siteId, userId 1 SIMPLE s ref PRIMARY,scriptId scriptId 1 const 219 Using where; Using temporary; Using filesort 1 SIMPLE u ref siteId siteId 4 s.siteId 389 1 SIMPLE su ref userId,bookmarkUrl,siteId userId 4 u.userId 1 Using where; Not exists Rows do not get deleted from this system Quote Link to comment Share on other sites More sharing options...
JonnoTheDev Posted August 5, 2010 Author Share Posted August 5, 2010 I think it is the fact that the submission table uses the INNODB engine containing 250,000 rows that this runs slow. From what I remember it used to be a MYISAM table but kept crashing so it was converted. Quote Link to comment Share on other sites More sharing options...
brianlange Posted August 5, 2010 Share Posted August 5, 2010 Is there an index on bookmarkUrl in the submission table? If there isn't it will have to do a full table scan. I don't think the index with the three fields userId,bookmarkUrl,siteId is necessary. Just have one for bookmarkUrl How much faster is the query if you eliminate the join to the user table? Have you considered denormalizing the table? There's so little data in the site table that you could store this in the submission table and do away with the need for a join between the site and submission tables. Quote Link to comment Share on other sites More sharing options...
fenway Posted August 5, 2010 Share Posted August 5, 2010 Don't even dream of denormalizing the table -- that's a bad crutch. How fast it is without the GROUP BY? Quote Link to comment Share on other sites More sharing options...
JonnoTheDev Posted August 5, 2010 Author Share Posted August 5, 2010 Ill check tomorrow and let you know Quote Link to comment Share on other sites More sharing options...
bh Posted August 6, 2010 Share Posted August 6, 2010 Maybe if you Analyze the table? Quote Link to comment Share on other sites More sharing options...
JonnoTheDev Posted August 6, 2010 Author Share Posted August 6, 2010 Using GROUP BY 77 total, Query took 16.4740 sec Without GROUP BY 44,317 total, Query took 0.1318 sec Quote Link to comment Share on other sites More sharing options...
bh Posted August 6, 2010 Share Posted August 6, 2010 Yeah, Group by cause "Using temporary; Using filesort", and these are the bad ones. edit: Anyway, with "SELECT DISTINCT" and without "GROUP BY" ? Quote Link to comment Share on other sites More sharing options...
JonnoTheDev Posted August 6, 2010 Author Share Posted August 6, 2010 See, there is still an issue here. Even though removing the GROUP BY claus results in a faster query process time (reported by phpMyAdmin), the actual results take the same amount of time to be returned to the client in both queries. So the query with the GROUP BY removed that supposedly took 0.1318 sec still takes around the same time as the first query to display any results. So that is fishy! Lets try using the actual MySQL command prompt on the server. The query without the GROUP BY claus: 44317 rows in set (15.48 sec) The query with the GROUP by claus: 77 rows in set (16.34 sec) Looks like phpMyAdmin gives incorrect results in this case. So fundamentally there is little difference between the 2. Quote Link to comment Share on other sites More sharing options...
JonnoTheDev Posted August 6, 2010 Author Share Posted August 6, 2010 Explain results [attachment deleted by admin] Quote Link to comment Share on other sites More sharing options...
bh Posted August 6, 2010 Share Posted August 6, 2010 The problem is in the first row: "ALL". Is there an index on s.siteID? Quote Link to comment Share on other sites More sharing options...
JonnoTheDev Posted August 6, 2010 Author Share Posted August 6, 2010 The problem is in the first row: "ALL". Is there an index on s.siteID? That is the Primary KEY Quote Link to comment Share on other sites More sharing options...
bh Posted August 6, 2010 Share Posted August 6, 2010 You see its get all of the users (77369)... Quote Link to comment Share on other sites More sharing options...
JonnoTheDev Posted August 6, 2010 Author Share Posted August 6, 2010 I can see that. site.siteId is the PK, user.siteId has an INDEX. If I run a query to join site to user on siteId where scriptId=x, the indexed used is siteId. In the problematic query the indexed is not used, so therefore it is the rest of the query. Quote Link to comment Share on other sites More sharing options...
bh Posted August 6, 2010 Share Posted August 6, 2010 Dont enough if you join the site and submission table where submission.userid is not null? Quote Link to comment Share on other sites More sharing options...
JonnoTheDev Posted August 6, 2010 Author Share Posted August 6, 2010 what? Quote Link to comment Share on other sites More sharing options...
bh Posted August 6, 2010 Share Posted August 6, 2010 I mean do you need the users table in your query? Not enough to check whether a submission has a user? If you need users datas and dont specify a user or users set in WHERE clause your query is ok i think. Quote Link to comment Share on other sites More sharing options...
JonnoTheDev Posted August 6, 2010 Author Share Posted August 6, 2010 I need users as I need to know the users from a particular scriptId and also the users who have submitted the bookmark already to sites from that script. Here is a modified query that uses all indexes. Still takes the same time to return results. Around 15 seconds. SELECT s.siteId FROM site s INNER JOIN user u ON(s.siteId=u.siteId AND u.scriptId='4') LEFT JOIN submission su ON(s.siteId=su.siteId AND u.userId=su.userId AND su.bookmarkUrl ='http://www.xyz.co.uk') WHERE ISNULL(su.subId) GROUP BY s.siteId So, from the 44,673 results I need all unique siteIds. That is where the GROUP BY comes in. This should result in about 77 records. Even if I use DISTINCT as opposed to GROUP BY, the query is still slow. [attachment deleted by admin] Quote Link to comment Share on other sites More sharing options...
fenway Posted August 7, 2010 Share Posted August 7, 2010 Using GROUP BY 77 total, Query took 16.4740 sec Without GROUP BY 44,317 total, Query took 0.1318 sec Forget about getting the rows back to php -- that's a different issue entirely. We need to focus on getting back the distinct list. And I have a hard time believing that it takes the same time to send the rows. Quote Link to comment Share on other sites More sharing options...
fenway Posted August 7, 2010 Share Posted August 7, 2010 I'm confused, reading over this thread -- don't you start with a bookmark? Why the left join? Quote Link to comment Share on other sites More sharing options...
JonnoTheDev Posted August 9, 2010 Author Share Posted August 9, 2010 I'm confused, reading over this thread -- don't you start with a bookmark? Why the left join? What it supposed to be happening is that the list of sites that a bookmark can be posted to are returned. For example if I have 5 sites of lets say scriptId 4, each site has 2 user accounts (10 total), and 1 bookmark to post. I want to know what users have posted this bookmark to what site and only return the sites that haven't had the bookmark posted by all user accounts. So out of 10 users, if 5 of them (1 from each site) have posted the bookmark. The query should still return all 5 sites as there is still 1 user account remaining for each site that the bookmark has not been posted to. If 1 of the 5 remaining accounts posts the bookmark then the query should return 4 sites as 1 site has used up all its user accounts. Using the following as opposed to 1 query still takes time to return the records. These queries are simple IMO. # OBTAIN ALL USERS FROM EACH SITE WHERE SCRIPT ID = X, CREATE TEMPORARY TABLE # ------------------------------- CREATE TEMPORARY TABLE siteUsers SELECT s.siteId, u.userId FROM site s INNER JOIN user u ON(s.siteId=u.siteId) WHERE s.scriptId='4'; # OBTAIN ALL USER ACCOUNTS FROM EACH SITE WHERE SCRIPT ID = X THAT HAVE SUBMITTED BOOKMARK X. DELETE THE RESULTS FROM TEMPORARY TABLE # ------------------------------- DELETE FROM siteUsers WHERE userId IN(SELECT su.userId FROM submission su WHERE su.bookmarkUrl ='http://www.xyz.com' AND su.scriptId='4'); # OBTAIN A DISTINCT LIST OF SITES FROM THE REMAINING RESULTS IN THE TEMPORARY TABLE. THE RESULTS ARE THE REMAINING SITES WHERE USER ACCOUNTS ARE # AVAILABLE TO POST BOOKMARK X # ------------------------------- SELECT siteId FROM siteUsers GROUP BY siteId; Quote Link to comment Share on other sites More sharing options...
fenway Posted August 9, 2010 Share Posted August 9, 2010 Well, the issue is that there are still 40K rows to examine. Don't you necessarily have the siteID in the users table? If so, skip the join to the site table at the beginning. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.