Jump to content

Makes no sense: Text columns & Table size


gnznroses

Recommended Posts

I have a table with over 50 Million rows. Two of the columns were Text fields, storing URLs and UserAgents. I just completed a change wherein I moved the URLs to a separate table and changed the field to Int, to store ID numbers pointing to the associated URLs in the other table.

 

In doing this I expected the filesize of the table to plummet, but it barely changed. The table was 16 GB, and I assumed that storing a URL for each row (with an average length of 40-50 characters, but sometimes 300+ characters) was about 40% of the total size. After changing over to IDs the table is still over 15GB. What kind of sense does that make?

 

The table is MyISAM and I understand that Text data is stored outside of the row itself, but it's still obviously part of the table and contained in the MYD file, so it should factor into the total size just the same. But it doesn't?

Link to comment
Share on other sites

k :)

 

well, i had the URLs stored as Text, which was bad. so i replaced that with Integers that hold ID numbers. the URLs are now stored in another table (using VarChar and a Unique index). This was because of the amount of duplicated data. Instead of storing the same URL over and over in my main table, I store it just once in the new table and then have it's ID number in the rows of the main table.

 

When I made the change, I did so by using a cronjob which went through every row of the main table, attempted to insert the URL into the new table, and then whether it inserted or failed due to the unique index, the URL gets replaced by it's ID. Because I was simply over-writing the URL itself with it's ID, the column type stayed as Text during this, until I finished with updating every single row. Once all the rows were updated to contain IDs, I changed the column type to Integer.

 

But, even though I eliminated what was surely gigs and gigs of textual data, the total size of my table barely dropped.

 

Hope this makes sense.

Link to comment
Share on other sites

yep. well i ran it when things were about 80% done. i was showing about 500MB of overhead at the time, and it eliminated that. when i ran it it performed a Repair By Sorting.

when i changed the data type it also did a Repair By Sorting.

 

i thought that would completely rebuild the table but maybe not. there has to be more than 500-700MB freed up though... I was expecting about a 40% reduction in total size. I have 12 columns in that table. Most are Ints, one is DateTime, one is VarChar with a limit of 60, and then one other Text column (useragents) which i was going to replace via the same method as the URLs.

 

... actually 40% would be optimistic, since the table is 11GB of data and 5GB of index, and i'd only be eliminating space from the Data portion, but still...

fyi also, the URLs column wasn't, and still isn't indexed. I plan on adding an index now that they're IDs.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.