gnznroses Posted December 7, 2011 Share Posted December 7, 2011 I have a table with over 50 Million rows. Two of the columns were Text fields, storing URLs and UserAgents. I just completed a change wherein I moved the URLs to a separate table and changed the field to Int, to store ID numbers pointing to the associated URLs in the other table. In doing this I expected the filesize of the table to plummet, but it barely changed. The table was 16 GB, and I assumed that storing a URL for each row (with an average length of 40-50 characters, but sometimes 300+ characters) was about 40% of the total size. After changing over to IDs the table is still over 15GB. What kind of sense does that make? The table is MyISAM and I understand that Text data is stored outside of the row itself, but it's still obviously part of the table and contained in the MYD file, so it should factor into the total size just the same. But it doesn't? Quote Link to comment Share on other sites More sharing options...
fenway Posted December 7, 2011 Share Posted December 7, 2011 Don't store them as TEXT -- they're not. Quote Link to comment Share on other sites More sharing options...
gnznroses Posted December 7, 2011 Author Share Posted December 7, 2011 Huh? I'm not Quote Link to comment Share on other sites More sharing options...
fenway Posted December 8, 2011 Share Posted December 8, 2011 Two of the columns were Text fields, Quote Link to comment Share on other sites More sharing options...
gnznroses Posted December 8, 2011 Author Share Posted December 8, 2011 "were" that's why i was making the changes that i made so yeah that's not my question lol Quote Link to comment Share on other sites More sharing options...
fenway Posted December 8, 2011 Share Posted December 8, 2011 Then my apologies -- let's start again, because I'm clearly mis-understanding. Quote Link to comment Share on other sites More sharing options...
gnznroses Posted December 8, 2011 Author Share Posted December 8, 2011 k well, i had the URLs stored as Text, which was bad. so i replaced that with Integers that hold ID numbers. the URLs are now stored in another table (using VarChar and a Unique index). This was because of the amount of duplicated data. Instead of storing the same URL over and over in my main table, I store it just once in the new table and then have it's ID number in the rows of the main table. When I made the change, I did so by using a cronjob which went through every row of the main table, attempted to insert the URL into the new table, and then whether it inserted or failed due to the unique index, the URL gets replaced by it's ID. Because I was simply over-writing the URL itself with it's ID, the column type stayed as Text during this, until I finished with updating every single row. Once all the rows were updated to contain IDs, I changed the column type to Integer. But, even though I eliminated what was surely gigs and gigs of textual data, the total size of my table barely dropped. Hope this makes sense. Quote Link to comment Share on other sites More sharing options...
mikosiko Posted December 8, 2011 Share Posted December 8, 2011 Did you run OPTIMIZE in your table after the changes? http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns) Quote Link to comment Share on other sites More sharing options...
gnznroses Posted December 8, 2011 Author Share Posted December 8, 2011 yep. well i ran it when things were about 80% done. i was showing about 500MB of overhead at the time, and it eliminated that. when i ran it it performed a Repair By Sorting. when i changed the data type it also did a Repair By Sorting. i thought that would completely rebuild the table but maybe not. there has to be more than 500-700MB freed up though... I was expecting about a 40% reduction in total size. I have 12 columns in that table. Most are Ints, one is DateTime, one is VarChar with a limit of 60, and then one other Text column (useragents) which i was going to replace via the same method as the URLs. ... actually 40% would be optimistic, since the table is 11GB of data and 5GB of index, and i'd only be eliminating space from the Data portion, but still... fyi also, the URLs column wasn't, and still isn't indexed. I plan on adding an index now that they're IDs. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.