Jump to content

Mass MySQL downloading - What's the most efficient way through PHP?


soma56

Recommended Posts

I have dozens of tables that a client needs to download through the web. I can use PHP to convert the tables within the database to a CSV however I have to split the files into 10k result chunks otherwise the server gets overloaded with memory.

 

I already raised the memory limit within the script that is downloaded (ini_set('memory_limit', '512M');) which is able to make this happen.

 

Downloading these files piece by piece is a really time consuming. Is there a memory safe and efficient way to combine all my tables as CSV's and into one zip them into one master download? PHPMyAdmin seems to do this smoothly, however, I need an online interface for the client to export the data directly...

Link to comment
Share on other sites

here is what I would do:

 

Page to start the table builds:

$email = "something@site.com";
exec("nohup php /location/to/php/file.php $email 2>&1 &");
echo "You will receive an email with a link to the file shortly.";

 

file.php

exec("mysqldump --skip-lock-tables -h localhost -u username --password=mypassword database table1 table2 table3 > /path/to/download/location/file.sql");

$message = "To download your tables, follow the link below \n\n http://mysite.com/downloads/file.sql";

mail($argv[1], "Table Build Complete" $message);

Link to comment
Share on other sites


Here's the page which displays the tables available for download split into 10k pieces

<?

//Connect to Database
require("db_connect.php");

$current_table = $_GET['current_table'];

//determine table total
$what_is_the_total = "SELECT * FROM $current_table"; 

$result_total = mysql_query($what_is_the_total);
$num_rows = mysql_num_rows($result_total);

//determine keywords completed
$result_completed = mysql_query("SELECT * FROM $current_table WHERE status = 'Complete'");
$num_rows_completed = mysql_num_rows($result_completed);

//Determine percentage complete
$percentage = round($num_rows_completed * 100 / $num_rows)."%";

$rangeend = 10000;
$row_counter = 1;

$add = 0;

$get_first_date = "SELECT * FROM $current_table WHERE status = 'Complete' AND date = '2012-03-17' ORDER BY mysql_id ASC LIMIT 1"; 

$first_date = mysql_query($get_first_date);

$first = mysql_fetch_array($first_date);
$rangestart = $first['id'];

?>
    <div style="text-align:center;">
           <table id="tbl_data">
              <thead>
                <tr>
                  <th scope="col">ID</th>
                  <th scope="col">MySQL ID Range</th>
                  <th scope="col">CSV ID Range</th>
                  <th scope="col">Keyword Range (Export)</th>
                  <th scope="col">Date Range</th>
                  <th scope="col">Time Range</th>
	   </tr>
              </thead>
              <tbody>
              <?PHP 
		  $get_all_completed = "SELECT * FROM $current_table WHERE status = 'Complete' AND id >= $rangestart"; 

$completed = mysql_query($get_all_completed);

while ($row = mysql_fetch_array($completed)){
		  $id = $row['id'];
		  $rangest = $row['mysql_id'];
		  $rangest = $rangest + $add;
		  $rangeend = $rangest + 10000;
		  			  
$get_start_ranges = "SELECT * FROM $current_table WHERE status = 'Complete' AND mysql_id = $rangest"; 
$start_ranges = mysql_query($get_start_ranges);
$start_range = mysql_fetch_array($start_ranges);
			  $start_id = $start_range['id'];
		  $start_keyword = $start_range['keyword'];
		  $start_date = $start_range['date'];
		  $start_time = $start_range['time'];
		  
$get_end_ranges = "SELECT * FROM $current_table WHERE status = 'Complete' AND mysql_id = $rangeend"; 
$end_ranges = mysql_query($get_end_ranges);
$end_range = mysql_fetch_array($end_ranges);
			$end_id = $end_range['id']; 
		$end_keyword = $end_range['keyword']; 
		$end_date = $end_range['date'];
		$end_time = $end_range['time'];

		if ($end_range == ""){

			$get_last_keyword = "SELECT * FROM $current_table WHERE status = 'Complete' AND id >= $rangest ORDER BY mysql_id DESC LIMIT 1"; 

$last_keyword = mysql_query($get_last_keyword);

$final_keyword = mysql_fetch_array($last_keyword);
$end_id = $final_keyword['id'];
$end_keyword = $final_keyword['keyword'];
$end_date = $final_keyword['date'];
$end_time = $final_keyword['time'];

$end = 1; 
		}
                echo "
  <tr>
  <td>$row_counter</td>
  <td>$rangest - $rangeend</td>
   <td>$start_id - $end_id</td>
   <td><a href=\"export.php?current_table=$current_table&start_id=$start_id&end_id=$end_id\">$start_keyword - $end_keyword</a></td>
  <td>$start_date - $end_date</td>
  <td>$start_time - $end_time</td></tr>";
  
  if ($end == 1){
  break;
  }
  
  $add = $add + 10000;
  
  $row_counter++;
}          
        ?>
              
</tbody></table>

 

Here is the script that process the range into CSV for the client:

 

<?PHP 
ini_set('memory_limit', '512M');

//Connect to Database
require("db_connect.php");

$current_table = $_GET['current_table'];
$start_id = $_GET['start_id'];
$end_id = $_GET['end_id'];
$current_table_results = $current_table."_results";

$query = "SELECT * FROM $current_table_results where id between $start_id and $end_id";
$result = mysql_query($query) or die("Error executing query: ".mysql_error());
$row = mysql_fetch_assoc($result);


$line = "";
$comma = ",";
foreach($row as $name => $value)
{
$line .= $comma . '"' . str_replace('"', '""', $name) . '"';
$comma = ",";
}
$line .= "\n";
$out = $line;

mysql_data_seek($result, 0);

while($row = mysql_fetch_assoc($result))
{
$line = "";
$comma = ",";
foreach($row as $value)
{
	$line .= $comma . '"' . str_replace('"', '""', $value) . '"';
	$comma = ",";
}
$line .= "\n";
$out.=$line;
}
header("Content-type: text/csv");
header("Content-Disposition: attachment; filename=$current_table-$start_id-$end_id.csv");
echo $out;
exit;

?>

 

I'm assuming there is a much more efficient way to:

 

A - download all the results from a specific table into CSV without having to split it up in a way that doesn't flag the memory.

 

and/or

 

B - Place the tables into a zip file so that the client can simple 'download' everything without having to waste time downloading each individual file.

 

 

Link to comment
Share on other sites

Sorry, I didn't read the full question, you want a CSV, well mysql can do that for you:

 

$query = 'SELECT * INTO OUTFILE '/tmp/name.csv'
FIELDS TERMINATED BY \',\'
OPTIONALLY ENCLOSED BY \'"\'
ESCAPED BY \'\\\'
LINES TERMINATED BY \'\n\'
FROM '.$current_table_results.' where id between '.$start_id.' and '.$end_id;
$result = mysql_query($query);

$zip = new ZipArchive;
$res = $zip->open('tables.zip', ZipArchive::CREATE);
if ($res === TRUE) {
    $zip->addFile('/tmp/name.csv', 'name.csv');
    $zip->close();
}

header("content-type: application/zip");
header("Content-Length: ".filesize("tables.zip").";");
readfile('tables.zip');

Link to comment
Share on other sites

One of the reasons you are having a memory problem is because you are selecting all the data in your tables just to get a count of the total number of rows and of the active number of rows, but you are not using the rows of data that were returned.

 

The first two queries in the code you posted should be SELECTing a COUNT() of the number of rows (your current code should be using mysql_free_result to free up the memory right after you use mysql_num_rows statement.)

 

Edit: You also have a lot of ridiculous queries and code following that. The first 'real' query where you are selecting all the matching rows is all you need. Just retrieve that data into an array and if you do actually need to split it into parts for downloading, which I doubt at this point, use array_chunk to break it into specifically sized pieces.

 

Edit2: Based on the excessive amount of queries and code, I would say that your memory problems are due to the code. You can likely replace everything you have shown with one query that gets the data that you want in the order that you want it, then simply output it the way you want as one CSV file per table.

 

 

Link to comment
Share on other sites

EDIT: PFMaBiSmAd already raised some of the same issues, but I'm going to post all of this as is.

 

Well, I would suggest using fputcsv() to generate the content instead of building your own csv writer. Additionally, I would write the contents to a temp file and then send that to the user - instead of compiling a 'dynamic' file. That way it all doesn't have to be held in memory during processing (I think)

 

I see other inefficiencies as well. For example, you use this to get the total number of rows

//determine table total
$what_is_the_total = "SELECT * FROM $current_table"; 

$result_total = mysql_query($what_is_the_total);
$num_rows = mysql_num_rows($result_total);

 

That is a huge waste since the DB has to return all those rows. Rather, just query for the count of rows and get one record back

//determine table total
$query = "SELECT COUNT(id) FROM $current_table"; 
$result = mysql_query($query);
$num_rows = mysql_result($result, 0);

 

Also, don't use '*' in your select statements unless you really need all the columns in the result. Again, that is a waste of resources.

 

I see that you are looping through each field in each record. You can process a whole row at once with fputcsv(). To get the names have a trigger for the first record and use array_keys() to get the field names to generate the first line.

 

This a very quick/rough rewrite, but it should be much more efficient

 

ini_set('memory_limit', '512M');

//Connect to Database
require("db_connect.php");

$current_table = trim($_GET['current_table']);
$current_table_results = mysql_real_escape_string($_GET['current_table']."_results");
//$start_id      = $_GET['start_id'];
//$end_id        = $_GET['end_id'];

//Create and run query
$query = "SELECT * FROM $current_table_results";
$result = mysql_query($query) or die("Error executing query: ".mysql_error());

//Create output file object
$fileObj = fopen("{$current_table}.csv", 'w');

$header = true;
while($row = mysql_fetch_assoc($result))
{
    if($header==true)
    {
        fputcsv($fileObj, array_keys($fields));
        $header = false;
    }
    fputcsv($fileObj, $fields);
}

//Output and delete temp file
$file_content = file_get_contents($fileObj);
unlink($fileObj);
header("Content-type: text/csv");
header("Content-Disposition: attachment; filename=$current_table-$start_id-$end_id.csv");
echo $file_content;

exit;

Link to comment
Share on other sites

Further to the above: The $percentage calculation in the first piece of code isn't even being used, so the two queries returning a huge number of rows aren't needed at all.

 

For the 2nd code piece that Psycho rewrote, since the file has been saved as a file, readfile (instead of file_get_contents) would read and output the file without consuming php's available memory.

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.