urieilam Posted November 12, 2008 Share Posted November 12, 2008 Hi Everyone, I was wondering if someone out there has some experience with working with large databases. I'm working on a backend which sends newsletters and maintains a database of the contacts. It should hold around 40,000 contacts, with around 6 fields per contact. I'm using PHP5 to script it. The backend is already programmed, and functions ok. The problem is that it is extremely slow. The database pulls info out of .csv files. On top of just entering the contacts in to the database, it compares existing information, in order to avoid double entries for the same email addresses. I've scripted it to chop the data into batches of 1000, and to reload the script at the end of each batch. It currently takes more than an hour to upload a new csv with about 10000 contacts, compare the info, and insert the information. I've tried running it both with a SQL database and a XML one (I know it's not the best idea to use XML for this, but hoped it would be quicker), and both take around the same amount of time. Any ideas on what the right design for something like this would be? Are there any quicker ways to go about it? I'm not much of a database expert, especially when it comes to something as huge as this. Any help would be appreciated! Thanks, uri Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/ Share on other sites More sharing options...
c0rnf3d Posted November 12, 2008 Share Posted November 12, 2008 im somewhat of a newb with large databases so take this with a grain of salt..... firstly have you isolated what is taking the long amount of time? is it the sql queries, or the reading of the file? since its not a ton of data, i would try reading it in and then storing it as an array.. and see if that speeds up time also can you display what your search query is? Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/#findComment-688569 Share on other sites More sharing options...
corbin Posted November 12, 2008 Share Posted November 12, 2008 In the big scheme of things, 40k rows isn't that big. I would love a site with 40,000 contacts, but I mean to MySQL it's not big. Why are you using CSV files? Where do those come from? It seems that the simplest solution would be to directly insert the email addresses and check if it exists then. And like c0rnf3d said, we need to know what exactly is the slow part. Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/#findComment-688925 Share on other sites More sharing options...
urieilam Posted November 13, 2008 Author Share Posted November 13, 2008 Thanks for the replies. I'm using CSV files because that's what the client needs - it's what their internal database software exports. I was wondering if it would be quicker to read everything first into an array. Would it be possible to maybe store the arrays - of existing and imported contacts - in some kind of global variables? is there something that could do that and would it be faster. I was checking what was taking the script time. Their were a few other php functions that took some time, but after removing them i see that my queries are still taking a while. I basically have two: the first one to find out if the email already exists: "SELECT * FROM addresses WHERE email='$email'" and the other one is either an insert or an update query, depending on the answer, something like: "UPDATE addresses SET lan='$lan' , email='$email' WHERE id = '$num'" and "INSERT INTO addresses VALUES ('','$i','','','$i','en')" Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/#findComment-688976 Share on other sites More sharing options...
corbin Posted November 13, 2008 Share Posted November 13, 2008 So basically what's taking forever is 10000 individual queries ;p. I don't know how much faster it would be (should be at least a little bit, although it would/should cause some table locking if something else has the chance of acting on the table at the same time), but you could use a temporary-ish table. Something like making a table with the same schema, and then importing from it. Something like: (Pretend this is your schema. Would need to be changed to fit your needs of course.) CREATE TABLE clients ( client_id INT AUTO_INCREMENTING PRIMARY_KEY, email varchar(255) NOT NULL, UNIQUE(`email`), some_other_column varchar(32) ); CREATE TABLE clients_import ( client_id INT AUTO_INCREMENTING PRIMARY_KEY, email varchar(255) NOT NULL, UNIQUE(`email`), some_other_column varchar(32) ); Then, your importing would go something like this: insert all of your CSV data into clients_import, preferably with a couple of queries with multiple value sets, or with the mysqlimport command line tool. Then: UPDATE clients c JOIN clients_import ci ON ci.email = c.email SET c.some_other_column = ci.some_other_column; INSERT INTO clients (email, some_other_column) (SELECT email, some_other_column FROM clients_import WHERE email NOT IN (SELECT email FROM clients)); DELETE FROM clients_import; Oh, that insert statement could be super slow by the way.... If you actually do it this way, I'll try to come up with a faster insert (probably with an exists column or a join). Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/#findComment-688994 Share on other sites More sharing options...
urieilam Posted November 13, 2008 Author Share Posted November 13, 2008 I've managed to cut off a lot of time by making the email column unique, and using only one query, something like: $query = "INSERT INTO addresses VALUES ('','$status','','','$email','$lan') ON DUPLICATE KEY UPDATE lan='$lan'..."; I checked about having all the emails first read into an array, but the csv file functions aren't taking up much time. Just running a loop with the query and empty variables instead of the information that should be coming from the csv takes up almost the same amount. Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/#findComment-689309 Share on other sites More sharing options...
Mchl Posted November 13, 2008 Share Posted November 13, 2008 Here are some tips on speeding up INSERTS Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/#findComment-689317 Share on other sites More sharing options...
urieilam Posted November 17, 2008 Author Share Posted November 17, 2008 Thanks Mchl, I'm going to try the LOAD DATA INFILE approach, seems like that would speed things up considerably... Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/#findComment-691916 Share on other sites More sharing options...
corbin Posted November 17, 2008 Share Posted November 17, 2008 Whoa.... I didn't know you could replace into with LOAD DATA INFILE. Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/#findComment-692331 Share on other sites More sharing options...
urieilam Posted November 18, 2008 Author Share Posted November 18, 2008 Can you "ON DUPLICATE KEY UPDATE" with "LOAD DATA INFILE"? Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/#findComment-693029 Share on other sites More sharing options...
Mchl Posted November 18, 2008 Share Posted November 18, 2008 Doesn't look like but I only briefly scanned it. Read it yourself and try to find out. Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/#findComment-693035 Share on other sites More sharing options...
corbin Posted November 18, 2008 Share Posted November 18, 2008 LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNORE] I would assume replace could essentially be used like update into. Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/#findComment-693048 Share on other sites More sharing options...
Mchl Posted November 19, 2008 Share Posted November 19, 2008 Right... REPLACE works like INSERT ON DUPLICATE KEY UPDATE Quote Link to comment https://forums.phpfreaks.com/topic/132431-large-size-database/#findComment-693305 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.