Jump to content

Inserting from a 61,000 record text file - does 1 1/2 hours sound right?


Recommended Posts

I have a text file that I wrote a script to read line by line and each line equals a new record in a table containing only

5 fields and the longest field is 17 character. This a simple take a line, substr it apart and insert a record.

 

I use to be a Visual Foxpro programmer and this would take 2 minutes. This simple task takes 1 1/2 hours or about a 1000 records added per minute. I am running this on my developement laptop using WIndows IIS.

 

Other information I can pass along is the storage engine is InnoDB, the file is about 5 megs, the Collation is utf8_general_ci and the row format is compact. The file has two indexes that are created type Btree. One is on a 5 character zipcode filed and the other index is a 2 char state field and a 3 char code field for a total of 5 characters. Everything about these files are small and I would think the process in MYSQL wouldn't be so long.

 

Any ideas or thoughts would be appreciated?

 

Thanks

You should temporarily disable the indexes on the table prior to do trying to insert all the records.  You can do this using ALTER TABLE

 

Also if you are not already doing so, you should be using the bulk insert syntax rather than doing one insert per row.  INSERT INTO table VALUES (...), (...), (...)

 

Lastly you might check your configuration for mysql.  It may be configured to not use very much memory (I think this is typical for a development install) which means it might be doing a lot of disk I/O.

 

sample of text from first few lines of 61000 line file - In file, the positions line

up though it doesn't appear that way from the cut and past below. As you will notice,

a very small, simply script that is taking forever. Removing the debug lines to the screen

made no difference.

 

0236005010000HOLTSVILLE NY103SUFFOLK N

0446006010000ADJUNTAS PR153YAUCO N

0446006020000AGUADILLA PR005AGUADILLA N

0446006020000AGUADA PR005AGUADILLA N

0446006020000AGUADA PR003AGUADA N

0446006030000AGUADILLA PR005AGUADILLA N

0446006040000AGUADILLA PR005AGUADILLA N

 

 

<?php

 

 

$filename = "c:\inetpub\wwwroot\text\zip.txt";

$logfile = fopen($filename, "r");

 

 

$hostname = "localhost";

$username = "xxxxxx";

$password = "xxxxxxx";

 

$dsn = 'mysql:host=localhost;dbname=zipdb

$db = new PDO($dsn, $username, $password);

 

 

print "<p>Login Successful :D</p>";

 

$i=0;

 

while(!feof($logfile)) {

 

$line = fgets($logfile);

 

## every 1000 lines, reset timer (I added this to stop timing out)

 

$i++;

 

 

if ($i > 1000) {

 

ini_set('max_execution_time', 600);

 

$i=1;

 

}

 

 

$f_office = substr($line,0,4);

$zip_code = substr($line,4,5);

$zip_plus = substr($line,9,4);

$city_name = substr($line,13,17);

$state_code = substr($line,30,2);

$county_code = substr($line,32,3);

$county_name = substr($line,35,15);

$l_option = substr($line,50,1);

 

 

 

$query = "INSERT INTO zipdb

(f_office,

zip_code,

zip_plus,

city_name,

state_code,

county_code,

county_name,

l_option)

 

VALUES

 

('$f_office',

'$zip_code',

'$zip_plus',

'$city_name' ,

'$state_code' ,

'$county_code' ,

'$county_name' ,

'$l_option')";

 

 

$db->exec($query);

 

## print "$i"."...<br>"; (below ## for debugging, then commented out)

## print "$line";

## print "$f_office <br>" ;

## echo "$zip_code <br>" ;

## echo "$zip_plus <br>" ;

## echo "$city_name <br>" ;

## echo "$state_code <br>" ;

## echo "$county_code <br>" ;

## echo "$county_name <br>" ;

## echo "$l_option <br>";

 

## echo "<br>";

 

 

}

 

 

ini_set('max_execution_time', '550');

 

if (fclose($logfile))

print "<br> File closed </br>";

 

 

?>;

I also tried this way with the same results

 

$lines = file($filename);

 

foreach ($lines as $line_num => $line)

 

{

 

$f_office = substr($line,0,4);

$zip_code = substr($line,4,5);

$zip_plus = substr($line,9,4);

$city_name = substr($line,13,17);

$state_code = substr($line,30,2);

$county_code = substr($line,32,3);

$county_name = substr($line,35,15);

$l_option = substr($line,50,1);

 

$query = "INSERT INTO zip

(f_office,

zip_code,

zip_plus,

city_name,

state_code,

county_code,

county_name,

l_option)

 

VALUES

 

('$f_office',

'$zip_code',

'$zip_plus',

'$city_name' ,

'$state_code' ,

'$county_code' ,

'$county_name' ,

'$l_option')";

 

 

$db->exec($query);

 

}

Please use the [code][/code] tags around your code, as it helps make both your post and your code a lot easier to read. Thanks.

 

That said, your main problem is that you're running one query per line. Instead of this you should do whatKicken outline above: Build up the query to a multi-insert, and then run it once. That'll cut down the time a lot.

Well, in this case, with so many lines. You might want to check the length as well, to cut it up to smaller segments. If the query becomes too big, it might be refused by the DB server and/or driver. Keep it below 1 MB, and you should be save. (strlen () can be used to get the size, since it assumes all characters are one byte.)

 

Also, using file_get_contents () instead of fopen (), fgets (), fclose () and the loop might help shave some extra time off.

Edited by Christian F.

I forgot to say that I did disable the indexes and it improved a little, but not much

It was refreshing to hear there was something wrong with my programming and it's not MYSQL. For a moment I was

thinking how did super fast Visual Foxpro die with the slow MYSQL speeds I was experiencing. I have not run run this

at my hosting website to save on bandwidth and have been working locally on my notebook using XP, which is the same

notebook I can do this in Foxpro in 10 seconds. I am not plugging Foxpro - it's what I know and the only db software I can

compare my results to. Thanks.

FoxPro would probably be this slow too, if you ran one query per line. :P Only difference is that you know FoxPro enough to avoid this from happening. So it's only a question of experience, which is easy enough to gain.

 

Yes, I'm in Europe. More specifically, Norway, as stated in my profile over <- there. ;)

I would do the following (since you can read the whole file into an array using file)-

 

1) Write a function that takes a (one) line and converts it to the data string that would go in the VALUES (....) portion of the query statement.

 

2) Use array_map to apply the function from item #1 to the array of lines, replacing the same array of lines so as to not consume more memory then needed.

 

3) Use array_chunk to split the array of lines into an array of arrays of lines (5k-10k in size for each chunk), replacing the same array of lines so as to not consume more memory then needed.

 

4) Iterate over the resulting array of chunks and implode each sub-array to make and execute a multi-value insert query from that sub-array.

 

Only one loop will be used and let php's array functions do most of the work.

Building on what Muddy_Funster posted, you could use steps #1 and #2 in what I suggested, but produce tab delineated data that you write to a file, then use a LOAD DATA IN FILE query to get the data into the database.

Small correction to my post above: I do, of course, recommend using file () as well for reading the file. As it'll give you an array straight away.

file_get_contents () was just force of habit. Sorry. :(

The following takes ~ 2.3/2.4 seconds on my development system for 61000 rows made up of repeated blocks of the sample data you posted -

 

<?php
// database connection here...

$filename = "zip.txt";
$block_size = 10000; // how many lines of data per multi-query
$lines = file($filename,FILE_IGNORE_NEW_LINES);

function make_line($line){
   $f_office  = substr($line,0,4);
   $zip_code   = substr($line,4,5);
   $zip_plus   = substr($line,9,4);
   $city_name = substr($line,13,17);
   $state_code    = substr($line,30,2);
   $county_code   = substr($line,32,3);
   $county_name   = substr($line,35,15);
   $l_option = substr($line,50,1);
   return "('$f_office','$zip_code','$zip_plus','$city_name','$state_code','$county_code','$county_name','$l_option')";
}

$lines = array_map('make_line',$lines);
$lines = array_chunk($lines ,$block_size);
$i = 1;
$start = microtime(true);
foreach($lines as $chunk){
   $query = "INSERT INTO zipdb (f_office,zip_code,zip_plus,city_name,state_code,county_code,county_name,l_option) VALUES " . implode(',',$chunk);
   // run query here...
   $result = $mysqli->query($query);
   echo "$i - ".count($chunk)." - ".strlen($query)." - ";
   var_dump($result);
   echo "<br />";
   $i++;

}
$end = microtime(true);
$diff = number_format($end - $start,2);
echo "Done, in $diff seconds.";

 

Some of your speed issues might be due to using PDO, because of what it does to make it general purpose, isn't going to result in the fastest execution.

 

Also, using a prepared query for a repeated simple query like this doesn't help much (5%-10% time savings) because you must still transfer the data that goes into each query to the database server, which takes significantly longer than it takes the server to run the actual query.

1 - 10000 - 770117 - int(10000)

2 - 10000 - 770117 - bool(false)

3 - 10000 - 770117 - int(10000)

4 - 10000 - 770117 - int(10000)

5 - 10000 - 770117 - bool(false)

6 - 10000 - 770117 - bool(false)

7 - 1097 - 84586 - int(1097)

Done, in 1.56 seconds.;

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.