mailjol Posted February 27, 2008 Share Posted February 27, 2008 I have MySQL 4.0.x and a 2 GB table called "people" with approx 1200000 rows in it having columns: id (PRIMARY AUTO INCREMENT) name email group_id VARCHAR( (INDEXED) address data reg_time (INDEXED) ip status phone There are approximately 14000 different group ids. I am facing a problem with a particular group which has approximately 25000 members. To search 300 members of that group starting from offset 15000 i do this query: SELECT `name` FROM `people` WHERE `group_id` = 'b67k219f' LIMIT 15000, 300 This takes approximately 35 seconds before MySQL returns the records even though the column group_id has a INDEX on it. To check further I created a separate table "people_idx" with just two columns "id" & "group_id" from the main table. Now I use following PHP to fetch names of 300 people starting offset 15000 from that group: $result1 = mysql_query("SELECT id FROM people_idx WHERE group_id = 'b67k219f' LIMIT 15000, 300"); while(list($id) = mysql_fetch_row($result1)) { $result2 = mysql_query("SELECT `name` FROM people WHERE id = $id LIMIT 1"); echo mysql_result($result2, 0)."\n"; } Even though this performs 300+1 different queries it returns back in less than a second. Now I tried this join: SELECT people.name FROM people, people_idx WHERE people_idx.group_id = 'b67k219f' AND people.id = people_idx.id LIMIT 15000, 300 This also takes more than 30 seconds. Now I am confused about what I am doing wrong and which approach should I use. Please anyone help me out. Thanks in advance. Regards, Akash Quote Link to comment Share on other sites More sharing options...
aschk Posted February 27, 2008 Share Posted February 27, 2008 Use EXPLAIN in front of those queries and post the output here so we can analyse the process that the pathing algorithm it choosing. Quote Link to comment Share on other sites More sharing options...
mailjol Posted February 27, 2008 Author Share Posted February 27, 2008 SORRY!! THERE WERE FEW ERRORS IN MY PREVIOUS REPLY. HERE IS THE CORRECT OUTPUT: For querying the main table: SELECT `name` FROM `people` WHERE `group_id` = 'b67k219f' LIMIT 15000, 300 the output of EXPLAIN is: table type possible_keys key key_len ref rows Extra people ref id id 10 const 21132 Using where For querying the test table: SELECT `id` FROM `people_idx` WHERE `group_id` = 'b67k219f' LIMIT 15000, 300 the output of EXPLAIN is: table type possible_keys key key_len ref rows Extra people_idx ref group_id group_id 10 const 19010 Using where And for the JOIN query: SELECT people.name FROM people, people_idx WHERE people_idx.group_id = 'b67k219f' AND people.id = people_idx.id LIMIT 15000, 300 the output of EXPLAIN is: table type possible_keys key key_len ref rows Extra people_idx ref group_id group_id 10 const 19010 Using where people eq_ref PRIMARY PRIMARY 3 people_idx.id 1 Using where Thanks.. Quote Link to comment Share on other sites More sharing options...
aschk Posted February 27, 2008 Share Posted February 27, 2008 Looking at the 1st and 2nd results from the above, it appears odd that the numbers of rows differs from 21132 to 19010 as they should be producing the same results. However, looking at your 1st query it appears that it's doing a select based on the id and NOT the group_id, implying there is no INDEX on group_id. Can you verify this? Quote Link to comment Share on other sites More sharing options...
mailjol Posted February 27, 2008 Author Share Posted February 27, 2008 Looking at the 1st and 2nd results from the above, it appears odd that the numbers of rows differs from 21132 to 19010 as they should be producing the same results. However, looking at your 1st query it appears that it's doing a select based on the id and NOT the group_id, implying there is no INDEX on group_id. Can you verify this? Don't go by the number of records as the main table "people" is constantly growing while "people_idx" was generated just to do these tests. Both tables have a index on group_id and the select is done based on group_id only. You can see all the WHERE clauses. "people_idx" is just a smaller version of "people" with just two columns. Quote Link to comment Share on other sites More sharing options...
mailjol Posted February 27, 2008 Author Share Posted February 27, 2008 Looking at the 1st and 2nd results from the above, it appears odd that the numbers of rows differs from 21132 to 19010 as they should be producing the same results. However, looking at your 1st query it appears that it's doing a select based on the id and NOT the group_id, implying there is no INDEX on group_id. Can you verify this? The explain output still has few errors as I had edited it a little bit to make it viewable and by mistake changed "group_id" to "id". Quote Link to comment Share on other sites More sharing options...
luca200 Posted February 28, 2008 Share Posted February 28, 2008 It's not the answer you need, but I don't believe it makes a great sense to use LIMIT without ORDER BY.... Quote Link to comment Share on other sites More sharing options...
aschk Posted February 28, 2008 Share Posted February 28, 2008 Good point well made there Luca. Indeed if you don't use ORDER BY there is no guarantee about which order the results will come back in. So you might do 15000, 300 , and then 15300, 300 and find it gives you some of the same result Quote Link to comment Share on other sites More sharing options...
mailjol Posted February 29, 2008 Author Share Posted February 29, 2008 Even with ORDER BY its so slow. I am now tearing my hairs apart. I don't know why Indexes work when I use them from a separate table having just two columns taken from the main table using JOIN and don't work when doing a straightforward query on the main table which has these two columns INDEXED. Quote Link to comment Share on other sites More sharing options...
fenway Posted February 29, 2008 Share Posted February 29, 2008 I just looked at the query.. and you're using a HUGE offset. Quote Link to comment Share on other sites More sharing options...
mailjol Posted February 29, 2008 Author Share Posted February 29, 2008 Yes thats right. 15000 is a big offset but it can be even larger that. But don't you think a 30+ second delay is still not justified if MySQL is using indexes. The same query returns instantly if a create a similar table with same number of rows but only these two columns (one which is queried upon and the other which is being fetched). Quote Link to comment Share on other sites More sharing options...
fenway Posted February 29, 2008 Share Posted February 29, 2008 Have you ANALYZE / OPTIMIZE tables recently? Quote Link to comment Share on other sites More sharing options...
mailjol Posted February 29, 2008 Author Share Posted February 29, 2008 Yes I did "OPTIMIZE" it quite a few times during these tests. Sometimes it responded "Table is already up to date". Quote Link to comment Share on other sites More sharing options...
fenway Posted February 29, 2008 Share Posted February 29, 2008 And ANALYZE too (since this deals with indexes directly)? Since you can recreate these data in another table with the same indexes (accordingly to your earlier post), it's hard to see what else is the issue here. Quote Link to comment Share on other sites More sharing options...
mailjol Posted February 29, 2008 Author Share Posted February 29, 2008 I have somehow reduced the delay to 1-3 seconds by using a covered index on columns which are frequently queried for. This returns the data from index itself without touching 1 GB table. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.