Jump to content

Recommended Posts

I have approximately 50k-100k files in a directory. I'm running a script to check if any of the files are being used by the DB, if not then delete them. The problem is that I have made a quick test on a directory of just 1k files and it dies. Is there a way to optimize it? I know the script works, it's just that it takes too long to run. Even with just 1k files. And I'm pretty sure is the is_dir that's taking its sweet time. Any ideas?

<?php
require_once 'db_connect.php';


$default_dir = "storage/2011/"; 

if(!($dp = opendir($default_dir))) die("Cannot open $default_dir.");

while($file = readdir($dp)) 
{
if(is_dir($file)) 
{
continue;
}
else if($file != '.' && $file != '..') 
{
$query = "SELECT * FROM images Where filename = '".$file."' OR thumbname = '".$file."'";
$dbResult = mysql_query($query);
$num_rows = mysql_num_rows($dbResult);
if ($num_rows == 0){
	unlink($default_dir.$file);
	echo $file."<br />"; 
}
}
}
closedir($dp);
?>

 

Link to comment
https://forums.phpfreaks.com/topic/228578-optimizing-is_dir/
Share on other sites

You would be much better of executing your query and from that making a list of valid files and going from there.

 

Does this really need to be executed via a web page as well? Surely this is something that can be done in a background process executed via cron or something?

Link to comment
https://forums.phpfreaks.com/topic/228578-optimizing-is_dir/#findComment-1178554
Share on other sites

yeah, I have access to the server. Tough I'm not much of an expert. I just know the basics. The web seems easier for me :/

 

what do you mean by making a list? I thought I already made one.

 

The script scans the dir and it cross references against the DB, if the file is not used in the DB then it gets deleted. I also added "LIMIT 1" to the query, but that didn't seem to help much.

Link to comment
https://forums.phpfreaks.com/topic/228578-optimizing-is_dir/#findComment-1178557
Share on other sites

what do you mean by making a list? I thought I already made one.

 

The script scans the dir and it cross references against the DB, if the file is not used in the DB then it gets deleted. I also added "LIMIT 1" to the query, but that didn't seem to help much.

 

As you have discovered, you are executing a new query for each file. The best thing to do would be to execute one query, save the results, then loop through those results and cross reference your files.

Link to comment
https://forums.phpfreaks.com/topic/228578-optimizing-is_dir/#findComment-1178560
Share on other sites

OK so I did what you recommended and it works better than the old one. I tried this on 1k files and it took about 3 minutes to run. I'm pretty sure it could be optimized, I just don't know what I can trim. This time my mysql didn't took a hit, but my php process was at 100%.

 

Any room for improvement?

 

<?php
require_once 'db_connect.php';

$default_dir = "storage/2011/"; 

//declare
$query = mysql_query("SELECT filename, thumbname FROM images");
$values_filename = array();
$values_thumbname = array();
$value_files = array();
$counter = 0;
$file_counter = 0;
$found = false;

//store db info in array
while ($row = mySql_fetch_array ($query)) {
$values_filename[$counter] = $row[filename];
$values_thumbname[$counter] = $row[thumbname];
$counter++;
}

if(!($dp = opendir($default_dir))) die("Cannot open $default_dir.");

//Store dir files in array
while($file = readdir($dp)) 
{
if(is_dir($file)) 
{
continue;
}
else if($file != '.' && $file != '..') 
{
$value_files[$file_counter] = $file;
$file_counter++;
}//end elseif
}//end while
closedir($dp);

//process files
for ( $index = 0; $file_counter > $index; $index += 1) {
for ($nest_index = 0; $counter > $nest_index; $nest_index +=1) {
	if (($value_files[$index] == $values_filename[$nest_index]) || ($value_files[$index] == $values_thumbname[$nest_index])) {
		$found = true; 
		$nest_index = $counter+1; //exit for loop
		}
}
if (!$found) echo $value_files[$index]."<br />"; //this is where the delete code goes. Just testing for now.
$found = false;//reset it for next file.
}
?>

Link to comment
https://forums.phpfreaks.com/topic/228578-optimizing-is_dir/#findComment-1178578
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.