Jump to content

Working with Large Tables


davidz

Recommended Posts

I am looking for some one that has experience with large tables in MySQL.  As in >10-20 gigs and up to say 200 gigs.

 

We are going to start archiving all of our email into a MySQL database.  Using PERL and PHP we've already got the system working, but doing the math we are looking at about 3-4 gigs of email each week.  None of us here have any experience working with MySQL on this large of a scale.  We have tables now that are in the 1 gig range, but not nearly as big as this email thing is going to be.

 

So, can anyone give me some advice on how to handle these tables, realistic size limits of MySQL, etc.  Thanks!!!

 

David

Link to comment
Share on other sites

I haven't myself worked with data sets that large, so the only thing I can help you with is a Google search, which turned up the following:

 

http://www.mysqlperformanceblog.com/2007/05/24/predicting-how-long-data-load-would-take/

 

They loaded 1TB of data in that post.  See also the main blog for some good tips on large database performance.

 

I expect good indexing will help you immeasureably.

Link to comment
Share on other sites

Thanks for the pointers.  I looked all through that link you provided and found lots of good stuff.  Even if not for large databases specifically.  Mostly about performance tunning.  I never realized that MySQL "out-of-the-box" has such horrible performance settings (obviously designed for compatibility).  I wrongly assumed that it was automatic enough to use what ever ram the system had available for it. 

 

But now that I've been educated I think that this is a point that often gets missed by newbies (like myself).  You have to manually go in and change several variables to get decent performance from your server.  I found out that MySQL was only using like <32 megs total ram ever.  What a complete waste of our Dual Xeon 3.0ghz servers with 3 gigs of ECC ram!!!  So now its setup to use over a gig.  And I'll watch it very closely over the next few days to increase the buffers that need it.

 

I'd be happy to share the settings I changed for an instant performance increase if anyone is following this thread...

 

 

David

Link to comment
Share on other sites

Here's a little about our setup:

Dual Processor Xeon 3.0ghz

2.5 gigs DDR2 533mhz EEC Registered Dual Channel

2 - 74gig UltraWide320 10k RPM SCSI setup in RAID 1

 

 

This server runs 5 or 6 Databases.  Most are small in size (<250 megs).  Two are larger with one is about 4 gigs with around 11 millions rows in the largest table.

 

Here is the my.cnf

[mysqld]
log-bin
server-id=1
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
set-variable=max_connections=250
set-variable=max_allowed_packet=32M
set-variable=query_cache_size=64M
set-variable=query_cache_limit=3M
set-variable=key_buffer_size=512M
set-variable=table_cache=500
set-variable=tmp_table_size=128M

# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

[mysql.server]
user=mysql
basedir=/var/lib

[mysqld_safe]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

 

The key point are the set-variable lines.  I'll go through them explaining what I know briefly.

set-variable=max_connections=250

Obvious.  How many simultaneous connections.

 

set-variable=max_allowed_packet=32M

Largest size allowed per SQL statement.  We increased this so we could insert full email body including attachments into database.  We don't get email bigger than 15 megs so 32 seemed ok.  Don't forget to include some extra for overhead of query.

 

set-variable=query_cache_size=64M

Size available to the query cache.  If you don't know what the query cache is reply and I can explain briefly or look it up on google.  Helps a ton!

 

set-variable=query_cache_limit=3M

Largest size of query that can be put into the query cache.

 

set-variable=key_buffer_size=512M

Extremely important!  The amount of RAM given to mysql to store your indicies.  The default I think is only 8 megs!  So if you look at all your DB's and Tables and find out how big all of your indicies are this is how much you would put here.  This is not always realistic so we added up the indicies for all the tables that get used alot and came up with around 400 megs.  We are still playing with this as the server has 2.5 gigs available we might increase if the buffer is filling up.

 

set-variable=table_cache=500

Home many tables to hold "open" in memory.  Default is like 16. :-(  I think that I saw somewhere that the recommended number is (number of max connections) * 'N'.  Where 'N' is the largest number of joins that you have in a query.  So if you have a query that has 5 joins and you have a max connections at 50 you should have table_cache at 250.  We are still tweaking this.  Started at 300 and it filled up fast so we increased to 500.  We'll see how it goes.

 

set-variable=tmp_table_size=128M

Max size for temp tables.  Once mysql needs a tmp table bigger that this setting it creates it on disk.  We are still working on this one as well.

 

So those are some of the major changes we've made recently with massive performance increases!!

 

--David

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.