Jump to content

Help Optimizing code


cytech

Recommended Posts

Good Morning,

 

I wrote a small import function for a website of mine and I know there has to be a better way to handle what I'm doing. I'm hoping someone can take my code and make it run a hair faster as it seems to be pretty slow right now. Think anyone can speed this up? What am I doing wrong? or am I doing it right?

 

The below function takes a file(csv) reads the content and imports it into the database, how can I optimize this to run faster and cleaner?

function importFile($user_id, $file, $file_id)
{
	if(!file_exists($file))
	{
		return false;
	}

	$handle = fopen($file, "rb");
	$contents = '';
	while (!feof($handle)) {
			 $contents .= fread($handle, 8192);
	}
	fclose($handle);	

	if(empty($contents))
	{
		return false;
	}

	// grab the items
	$items = explode("\n", $contents);

	// if there are items import them.
	if(is_array($items))
	{
		foreach($items as $key=>$val)
		{
			$save_val = $val;
			$val = explode(",", trim($val));	
			if(!is_array($val) || sizeOf($val)<=1)
			{
				$val = explode("\t", trim($save_val));
				if(!is_array($val))
				{
					return false;
				}	
			}

			// part number
			// alt part number
			// condition
			// qty
			// description

			foreach($val as $inner_key=>$inner_val)
			{
				$inner_val = str_replace("\r", "", $inner_val);
				$val[$inner_key] = str_replace("\"", "", $val[$inner_key]);
				$val[$inner_key] = trim(mysql_real_escape_string($val[$inner_key]));
			}

			if(!empty($val[0]) && strtolower($val[0]) != "part number")
			{

			$val[0] = $this->fixPartNumber($val[0]);
			$val[1] = $this->fixPartNumber($val[1]);
			// check to see if we need to insert or update
			$select_sql = "SELECT inventory_id FROM ".TABLE_PREFIX."inventory WHERE inventory_part_number='{$val[0]}' AND user_id={$user_id}";
			$result = $this->db->query($select_sql);
			//echo mysql_error()."<BR><BR>";
			if(!$result)
			{
				echo "fail";
				exit;
			}
			if(mysql_num_rows($result) == 0)
			{
				// no record, so insert
				$insert_sql = "INSERT INTO ".TABLE_PREFIX."inventory (inventory_part_number, inventory_alt_part_number, inventory_condition_code,
						       inventory_quantity, inventory_description, last_update, user_id) VALUES (
                                   '{$val[0]}', '{$val[1]}', '{$val[2]}', '{$val[3]}', '{$val[4]}', NOW(), '{$user_id}')";
				//echo $insert_sql."<BR><BR>";
				$result = $this->db->query($insert_sql);
				echo "Inserted: ".$val[0]."<br />";
			}
			else
			{
				$update_sql = "UPDATE ".TABLE_PREFIX."inventory SET inventory_part_number='{$val[0]}', inventory_alt_part_number='{$val[1]}',
							   inventory_condition_code='{$val[2]}', inventory_quantity='{$val[3]}', inventory_description='{$val[4]}', last_update = NOW() 
							   WHERE user_id = {$user_id} AND inventory_part_number = '{$val[0]}'";
				//echo $update_sql."<BR><BR>";
				$result = $this->db->query($update_sql);
				echo "Updated: ".$val[0]."<br />";
			}
			}
		}
		$update_sql = "UPDATE ".TABLE_PREFIX."file_upload SET file_flag = 1 WHERE file_id=".$file_id." AND user_id=".$user_id;
		$result = $this->db->query($update_sql);
		return true;
	}
}

Link to comment
Share on other sites

Doing some research and found some benchmarks that state - foreach loops used with large amounts of data are way to inefficient and you should use for loops. Anyone care to back up this claim?

 

What is your application of this? It's sorta ironically mixing flatfiles with databases. If it's just for importing the odd file it shouldn't matter, but if it's an essential code than you should store the files in a format, serialized, imploded, in an array, or even in xml/csv which can be easily handled with built in plugins that should perform much faster than all that parsing.

 

You can test speeds and try a few optimizations using this benchmarking code if you wish, I think it works:

 

This'll be at the top of your page, or included..

$starttime = explode(' ', microtime());

$starttime = $starttime[1] + $starttime[0];

And this at the bottom:

$mtime = explode(' ', microtime()); 

$totaltime = $mtime[0] + $mtime[1] - $starttime; 

printf('<br>Page loaded in %.4f seconds.', $totaltime);

Link to comment
Share on other sites

Thank you for your post and for the benchmark code, I will plop that in and see how it looks.

 

It is somewhat essential, users upload their csv files and the system needs to import it into the database. I wonder if there is a solution that upon upload it turns the csv into something that will make importing a lot easier... I will give that a few google searches.

Link to comment
Share on other sites

Thank you for your post and for the benchmark code, I will plop that in and see how it looks.

 

It is somewhat essential, users upload their csv files and the system needs to import it into the database. I wonder if there is a solution that upon upload it turns the csv into something that will make importing a lot easier... I will give that a few google searches.

 

fgetcsv and similar? Since it's internal rather than logic it should run faster, you should run some tests.

Link to comment
Share on other sites

How many rows do you have in your csv? Your script is likely taking a lot more time doing all the SELECT and UPDATE statements than you will ever notice from switching foreach to for. Perhaps there would be a quicker method using the INSERT...ON DUPLICATE syntax, but I don't know I've never used it. As to which loop is technically quicker, there appears to be no definitive answer, it would very much seem to depend on what exactly your doing on each iteration.

 

Edit: Or possibly LOAD DATA INFILE?

Link to comment
Share on other sites

Hey Cags,

 

Agreed... I just figured that out, I switched out the foreach loops and it did run a "hair" faster, not enough to worry about it. So that didn't help. However, when running the import function "without" the database queries it ran through the csv perfectly and decently fast. I put the queries back in and it went to a halt, slow as a snail. So it has to do with the database and queries at this point.

 

With that being said, I need to look at the database and figure out a more appropriate fix.

 

Thank you!

Link to comment
Share on other sites

I was thinking of ON DUPLICATE as well, but that only works when you have a single column being used for the duplicate check, whereas it looks like this is using two fields for uniqueness: user and part number.

 

I do see a lot of room for improvement though:

 

First, you have the following code to read the file into a string, then you explode the string into an array.

		$contents = '';
	while (!feof($handle)) {
			 $contents .= fread($handle, 8192);
	}
	fclose($handle);	

	if(empty($contents))
	{
		return false;
	}

	// grab the items
	$items = explode("\n", $contents);

 

You can replace all of that with a single line which reads the contents directly into an array using the function file()

$items = file($file);

 

Second, and this is minor, you are runnig an is_array() check after creating an array - kind of pointless.

 

Third, you are iterrating through the data several times. Once to validate/cleant the data and then again to put the data in the DB. Do it all in one loop.

 

Finally, and most importantly, as others have noted looping queries is probably the biggest hit inperformance. You are doing two queries for each record - one to check if the record exists and then a second to update or insert the record. Instead, always do an UPDATE query, then use mysql_affected_rows() to determine if the record already existed. If not, then add the values to an array. Once you've gone through every record you have run one query for each and all the updates are done. Then you can insert all the new records in one single query. Doing this will reduce the number of queries needed in half.

 

Here is some modified code I put together. This is not tested though

<?php

    function importFile($user_id, $file, $file_id)
    {
        if(!file_exists($file))
        {
            return false;
        }

        // grab the items
        $recordsAry = file($file);
        if (count($recordsAry)<1)
        {
            return false;
        }

        foreach($recordsAry as $rKey => $recordStr)
        {
            $recordStr = trim($recordStr);
            if($recordStr=='')
            {
                //Remove empty lines
                unset($recordsAry[$rKey$rKey]);
            }
            else
            {
                //Create variable to store records to be inserted
                $insertValues = array();

                $recordStr = str_replace("\"", "", $recordStr);
                $recordStr = mysql_real_escape_string($recordStr);

                //Split record string into fields vars
                $delimiter = (strstr($recordStr, ',')!==false) ?  ",": "\t";
                $fieldsAry = explode($delimiter, $recordStr);
                list($partNumber, $altPartNumber, $condition, $qty, $description) = $fieldsAry;

                $partNumber = $this->fixPartNumber($partNumber);
                $altPartNumber = $this->fixPartNumber($altPartNumber);

                //Run update script
                $sql = "UPDATE ".TABLE_PREFIX."inventory
                        SET inventory_alt_part_number='{$altPartNumber}',
                            inventory_condition_code='{$condition}',
                            inventory_quantity='{$qty}',
                            inventory_description='{$description}',
                            last_update = NOW()
                        WHERE user_id = {$user_id}
                            AND inventory_part_number = '{$partNumber}'";

                $result = $this->db->query($sql);

                //Check results
                if(!$result)
                {
                    echo "Fail Update: {$partNumber}<br />\n";
                }
                else if (mysql_affected_rows()===0)
                {
                    //Record doesn't exist - add to INSERT values array
                    $insertValues[$partNumber] = "('{$partNumber}', '{$altPartNumber}', '{$condition}',
                             '{$qty}', '{$description}', NOW(), '{$user_id}')";
                }
                else
                {
                    //Record was updated
                    echo "Updated: {$partNumber}<br />\n";
                }
            }

            //Add new records
            if(count($insertValues)>0)
            {
                //Insert records that need to be added
                $sql = "INSERT INTO ".TABLE_PREFIX."inventory
                        (inventory_part_number, inventory_alt_part_number, inventory_condition_code,
                         inventory_quantity, inventory_description, last_update, user_id)
                        VALUES " implode(', ', $insertValues);

                $result = $this->db->query($sql);

                if(!$result)
                {
                    echo "Fail Insert: " . implode(', ', array_keys($insertValues)) . "<br />\n";
                }
                else
                {
                    echo "Inserted: " . implode(', ', array_keys($insertValues)) . "<br />\n";
                }
            }
        }

        $sql = "UPDATE ".TABLE_PREFIX."file_upload SET file_flag = 1 WHERE file_id={$file_id} AND user_id={$user_id}";
        $result = $this->db->query($sql);
        return true;
    }

?>

Link to comment
Share on other sites

Well the suggestions put out have helped overall with performance but when updating 5-10k records its still VERY slow. This is just due to the database.

 

The performance hit isn't with inserts but with the update commands, so I re-did it so I could do the update command in one go instead of doing multiple calls. This seems to have improved performance slightly. However, when doing thousands of records its still very slow. Not sure there's a way around this one. :(

 

Once again thank you all for your help.

Link to comment
Share on other sites

So with each record, you are querying the database to see if it's there...if it is, you are updating - and if it's not, you are inserting.

 

I would suggest using REPLACE INTO

 

Basically if it's there, it deletes and reinserts.  If it's not, it just inserts.

 

That will probably speed up your performance a ton.  That, and getcsv().

 

That or take a look at ON DUPLICATE KEY in mysql.

 

 

Link to comment
Share on other sites

Hey nafetski,

 

Thanks for the ideas.

 

ON DUPLICATE seems like the best option in this situation however I have two unique keys and not just one.

 

REPLACE INTO even though would be a single query, would have its down fall because if the record does exist (which most do) it would take 2 queries to "update" it. So after all is said and done it would be 3 queries for one entry. Even though its faster then doing 3 queries yourself, if I can get it down to a single query on both sides its great.

 

What I have now is doing one insert query into a temporary table (this takes 1-2 seconds if that). I then do an update based from the temporary table to the live table in one update command (this still takes a lot of time - just so many records). Then those that where not updated from the temporary table I insert into the live table. Then truncate temp able and move onto a new file.

 

This works great, except for when you have 30k updates to do haha. Oh well.

Link to comment
Share on other sites

ON DUPLICATE seems like the best option in this situation however I have two unique keys and not just one.

 

Actually, you have no unique keys. The combination of two values must be unique - which ON DUPLICATE cannot work with. But, there is a simple solution - create a new, unique key using those two values. In addition to saving the user ID and the part number as individual fields, create a new column in the table where you save the value "[userID]-[partNo]". Then that field can be set up as a unique column and you can use ON DUPLICATE.

 

Then, as in the examples I provided earlier, use PHP to create one single query using ON DUPLICATE.

Link to comment
Share on other sites

That's not a bad idea..

 

I was reading though and people where saying the overhead of using ON DUPLICATE was a dangerous matter? This true?

 

They where saying if the entry is found ON DUPLICATE removes it and then inserts a new one which in turn is 2 queries. Or am I just over thinking this and I should not worry about that..

Link to comment
Share on other sites

That's not a bad idea..

 

I was reading though and people where saying the overhead of using ON DUPLICATE was a dangerous matter? This true?

 

They where saying if the entry is found ON DUPLICATE removes it and then inserts a new one which in turn is 2 queries. Or am I just over thinking this and I should not worry about that..

 

The best way to find out would be to test it.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.