jaymc Posted August 12, 2008 Share Posted August 12, 2008 Im having some trouble with a big table I have It has 4.7 million records and is heavily accessed It has the following fields id = PRIMARY user = varchar25 INDEX friend = varchar25 INDEX timestamp = int12 gender = varchar6 x = tinyint real = enum 0,1 I cant prune redundant records because.. well basically there is none Running SELECT count(id) FROM table GROUP BY user takes 5 minutes+ I deleted an index on the x colum and that took 28 minutes to complete (ALTER TABLE)[/b] What options do I have in circumstances of naturally large tables The example of SELECT count(id) FROM table GROUP BY user above, is there anything I can do to get something like that complete after 5 seconds? Quote Link to comment Share on other sites More sharing options...
Xurion Posted August 12, 2008 Share Posted August 12, 2008 What type of engine is the table on? Myisam? Innodb? Quote Link to comment Share on other sites More sharing options...
jaymc Posted August 12, 2008 Author Share Posted August 12, 2008 Sorry INNODB Quote Link to comment Share on other sites More sharing options...
fenway Posted August 12, 2008 Share Posted August 12, 2008 Why not a covering index on ( user, id )? Quote Link to comment Share on other sites More sharing options...
jaymc Posted August 12, 2008 Author Share Posted August 12, 2008 Sorry, bad example id is never ever used, apart from in my sample query.. so a covering index on that would prove useless Quote Link to comment Share on other sites More sharing options...
fenway Posted August 12, 2008 Share Posted August 12, 2008 Sorry, bad example id is never ever used, apart from in my sample query.. so a covering index on that would prove useless I don't understand. Quote Link to comment Share on other sites More sharing options...
jaymc Posted August 12, 2008 Author Share Posted August 12, 2008 on my website I never use the id field however, in the example I gave you above of a bad query I used the id field leading you to believe i need a covering index on it Quote Link to comment Share on other sites More sharing options...
fenway Posted August 12, 2008 Share Posted August 12, 2008 on my website I never use the id field however, in the example I gave you above of a bad query I used the id field leading you to believe i need a covering index on it Why are we talking about imaginary queries? I still don't understand. Quote Link to comment Share on other sites More sharing options...
jaymc Posted August 12, 2008 Author Share Posted August 12, 2008 Well I want to know why this SELECT count(id) FROM table GROUP BY user takes over 5 minutes to complete when id is the primary key Quote Link to comment Share on other sites More sharing options...
fenway Posted August 13, 2008 Share Posted August 13, 2008 Why count(id) and not count(*)? Quote Link to comment Share on other sites More sharing options...
awpti Posted August 14, 2008 Share Posted August 14, 2008 Why are you using a group by? That makes no sense. SELECT COUNT(*) FROM my_table; will return the same total number. How are you defining which group a given count belongs to? I'd suggest creating an index: ALTER TABLE `my_table` ADD INDEX ( `user` ( 25 ) ) ; Reduced the count query time from 1m 32s to ~20sec. I'm guessing your db server isn't a terribly beefy machine. As a side note, I forced in double the amount of records you have, all with random data. Quote Link to comment Share on other sites More sharing options...
toplay Posted August 14, 2008 Share Posted August 14, 2008 You could run optimize periodically: http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html and: http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. This optimization applies only to MyISAM tables only, because an exact row count is stored for this storage engine and can be accessed very quickly. For transactional storage engines such as InnoDB and BDB, storing an exact row count is more problematic because multiple transactions may be occurring, each of which may affect the count. Quote Link to comment Share on other sites More sharing options...
jaymc Posted August 14, 2008 Author Share Posted August 14, 2008 I dont think optimize will help as they are INNODB tables thus zero/little overhead. I've only experienced fregmented tables with MyIsam If ran optimize on the 4.7 million rows table it would take about 10-20 minutes I already have an index on user SELECT count(*) FROM table GROUP BY user This is an example of a query that is taking 400+ seconds to complete. Ignore the relivence of it, as its only an example and not something I ever use, but if I can find out why a simple query like this takes so long.. Quote Link to comment Share on other sites More sharing options...
Hooker Posted August 14, 2008 Share Posted August 14, 2008 Doing a "SELECT Count(*) FROM table" on InnoDB and requires a full table scan. (With MyIsam this operation doesn't cost anything because MyIsam stores an internal record counter with each table). Quote Link to comment Share on other sites More sharing options...
awpti Posted August 14, 2008 Share Posted August 14, 2008 Your DB design here is full of fail. Do you handle tons of inserts/updates to this table? If not, switch the engine to isam. Quote Link to comment Share on other sites More sharing options...
jaymc Posted August 14, 2008 Author Share Posted August 14, 2008 Your DB design here is full of fail. Do you handle tons of inserts/updates to this table? If not, switch the engine to isam. What would you consider as tons of updates/inserts. How many a second to justify it being innodb? Quote Link to comment Share on other sites More sharing options...
fenway Posted August 14, 2008 Share Posted August 14, 2008 I will echo the previous comment about COUNT(*) and InnoDB -- have a summary table. Quote Link to comment Share on other sites More sharing options...
Xurion Posted August 14, 2008 Share Posted August 14, 2008 Change your tables to myisam. Count works faster on these. Quote Link to comment Share on other sites More sharing options...
corbin Posted August 14, 2008 Share Posted August 14, 2008 Change your tables to myisam. Count works faster on these. But MyISAM does table locks, instead of row locks. If he has lots of updates/deletes, MyISAM could be a bad idea. Quote Link to comment Share on other sites More sharing options...
awpti Posted August 15, 2008 Share Posted August 15, 2008 His next best option is to switch to MySQL 5.1 and start making use of partitions. That's HUGE. I tried his table layout on my own server with partitions and ~20M rows. Queries that took 2-3 minutes without partitions took ~5-10 seconds tops with. Lots of solutions to this issue. count() on an innodb table is just bad news. A summary table wouldn't be a bad idea at all.. just make a single field table with a value you add/subtract from based upon the addition or subtraction of rows. Quote Link to comment Share on other sites More sharing options...
jaymc Posted August 17, 2008 Author Share Posted August 17, 2008 This martition stuff in 5.1 Does MYSQL fully take care of it as in manage the data, or is it something I need to take care of on query level Sounds nice Quote Link to comment Share on other sites More sharing options...
Mchl Posted August 17, 2008 Share Posted August 17, 2008 I'd really like to try it out as well. It seems that apart from defining partitions, the rest is taken care of by mysql. awpti: Into how many parts did you partition your 20Mrows table? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.