Jump to content

Large Size Database


urieilam

Recommended Posts

Hi Everyone,

 

I was wondering if someone out there has some experience with working with large databases. I'm working on a backend which sends newsletters and maintains a database of the contacts. It should hold around 40,000 contacts, with around 6 fields per contact. I'm using PHP5 to script it.

 

The backend is already programmed, and functions ok. The problem is that it is extremely slow.

 

The database pulls info out of .csv files. On top of just entering the contacts in to the database, it compares existing information, in order to avoid double entries for the same email addresses. I've scripted it to chop the data into batches of 1000, and to reload the script at the end of each batch.

 

It currently takes more than an hour to upload a new csv with about 10000 contacts, compare the info, and insert the information.

I've tried running it both with a SQL database and a XML one (I know it's not the best idea to use XML for this, but hoped it would be quicker), and both take around the same amount of time.

 

Any ideas on what the right design for something like this would be? Are there any quicker ways to go about it? I'm not much of a database expert, especially when it comes to something as huge as this.

 

Any help would be appreciated!

Thanks, uri

Link to comment
Share on other sites

im somewhat of a newb with large databases so take this with a grain of salt.....

 

firstly have you isolated what is taking the long amount of time?

is it the sql queries, or the reading of the file?

since its not a ton of data, i would try reading it in and then storing it as an array.. and see if that speeds up time

 

also can you display what your search query is?

Link to comment
Share on other sites

In the big scheme of things, 40k rows isn't that big.  I would love a site with 40,000 contacts, but I mean to MySQL it's not big.

 

 

 

Why are you using CSV files?  Where do those come from?  It seems that the simplest solution would be to directly insert the email addresses and check if it exists then.

 

 

And like c0rnf3d said, we need to know what exactly is the slow part.

Link to comment
Share on other sites

Thanks for the replies. I'm using CSV files because that's what the client needs - it's what their internal database software exports. I was wondering if it would be quicker to read everything first into an array. Would it be possible to maybe store the arrays - of existing and imported contacts - in some kind of global variables? is there something that could do that and would it be faster.

 

I was checking what was taking the script time. Their were a few other php functions that took some time, but after removing them i see that my queries are still taking a while. I basically have two: the first one to find out if the email already exists:

"SELECT * FROM addresses WHERE email='$email'"

 

and the other one is either an insert or an update query, depending on the answer, something like:

"UPDATE addresses SET lan='$lan' , email='$email' WHERE id = '$num'"

and

"INSERT INTO addresses VALUES ('','$i','','','$i','en')"

 

 

Link to comment
Share on other sites

So basically what's taking forever is 10000 individual queries ;p.

 

 

I don't know how much faster it would be (should be at least a little bit, although it would/should cause some table locking if something else has the chance of acting on the table at the same time), but you could use a temporary-ish table.

 

 

Something like making a table with the same schema, and then importing from it.

 

Something like:

 

(Pretend this is your schema.  Would need to be changed to fit your needs of course.)

 

CREATE TABLE clients (

    client_id INT AUTO_INCREMENTING PRIMARY_KEY,

    email varchar(255) NOT NULL,

    UNIQUE(`email`),

    some_other_column varchar(32)

);

 

CREATE TABLE clients_import (

    client_id INT AUTO_INCREMENTING PRIMARY_KEY,

    email varchar(255) NOT NULL,

    UNIQUE(`email`),

    some_other_column varchar(32)

);

 

 

 

Then, your importing would go something like this:

 

insert all of your CSV data into clients_import, preferably with a couple of queries with multiple value sets, or with the mysqlimport command line tool.

 

 

Then:

 

UPDATE clients c JOIN clients_import ci ON ci.email = c.email SET c.some_other_column = ci.some_other_column;

 

INSERT INTO clients (email, some_other_column)

    (SELECT email, some_other_column FROM clients_import WHERE email NOT IN (SELECT email FROM clients));

 

DELETE FROM clients_import;

 

 

 

 

Oh, that insert statement could be super slow by the way....  If you actually do it this way, I'll try to come up with a faster insert (probably with an exists column or a join).

Link to comment
Share on other sites

I've managed to cut off a lot of time by making the email column unique, and using only one query, something like:

 

$query = "INSERT INTO addresses VALUES ('','$status','','','$email','$lan')

ON DUPLICATE KEY UPDATE lan='$lan'...";

 

I checked about having all the emails first read into an array, but the csv file functions aren't taking up much time. Just running a loop with the query and empty variables instead of the information that should be coming from the csv takes up almost the same amount.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.