nobodyk Posted February 23, 2011 Share Posted February 23, 2011 I have approximately 50k-100k files in a directory. I'm running a script to check if any of the files are being used by the DB, if not then delete them. The problem is that I have made a quick test on a directory of just 1k files and it dies. Is there a way to optimize it? I know the script works, it's just that it takes too long to run. Even with just 1k files. And I'm pretty sure is the is_dir that's taking its sweet time. Any ideas? <?php require_once 'db_connect.php'; $default_dir = "storage/2011/"; if(!($dp = opendir($default_dir))) die("Cannot open $default_dir."); while($file = readdir($dp)) { if(is_dir($file)) { continue; } else if($file != '.' && $file != '..') { $query = "SELECT * FROM images Where filename = '".$file."' OR thumbname = '".$file."'"; $dbResult = mysql_query($query); $num_rows = mysql_num_rows($dbResult); if ($num_rows == 0){ unlink($default_dir.$file); echo $file."<br />"; } } } closedir($dp); ?> Quote Link to comment https://forums.phpfreaks.com/topic/228578-optimizing-is_dir/ Share on other sites More sharing options...
nobodyk Posted February 23, 2011 Author Share Posted February 23, 2011 So I just check the top command in linux and mysql went off the charts. It does a query for every file, so it will do about 100k queries. Any way to improve the query aswell? will count instead of select help? Quote Link to comment https://forums.phpfreaks.com/topic/228578-optimizing-is_dir/#findComment-1178549 Share on other sites More sharing options...
trq Posted February 23, 2011 Share Posted February 23, 2011 You would be much better of executing your query and from that making a list of valid files and going from there. Does this really need to be executed via a web page as well? Surely this is something that can be done in a background process executed via cron or something? Quote Link to comment https://forums.phpfreaks.com/topic/228578-optimizing-is_dir/#findComment-1178554 Share on other sites More sharing options...
nobodyk Posted February 23, 2011 Author Share Posted February 23, 2011 yeah, I have access to the server. Tough I'm not much of an expert. I just know the basics. The web seems easier for me :/ what do you mean by making a list? I thought I already made one. The script scans the dir and it cross references against the DB, if the file is not used in the DB then it gets deleted. I also added "LIMIT 1" to the query, but that didn't seem to help much. Quote Link to comment https://forums.phpfreaks.com/topic/228578-optimizing-is_dir/#findComment-1178557 Share on other sites More sharing options...
trq Posted February 23, 2011 Share Posted February 23, 2011 what do you mean by making a list? I thought I already made one. The script scans the dir and it cross references against the DB, if the file is not used in the DB then it gets deleted. I also added "LIMIT 1" to the query, but that didn't seem to help much. As you have discovered, you are executing a new query for each file. The best thing to do would be to execute one query, save the results, then loop through those results and cross reference your files. Quote Link to comment https://forums.phpfreaks.com/topic/228578-optimizing-is_dir/#findComment-1178560 Share on other sites More sharing options...
nobodyk Posted February 23, 2011 Author Share Posted February 23, 2011 I think I have an idea on how to do it. I would have to use mysql_fetch_array? the thing is I don't know how to cross reference them. Quote Link to comment https://forums.phpfreaks.com/topic/228578-optimizing-is_dir/#findComment-1178565 Share on other sites More sharing options...
nobodyk Posted February 23, 2011 Author Share Posted February 23, 2011 OK so I did what you recommended and it works better than the old one. I tried this on 1k files and it took about 3 minutes to run. I'm pretty sure it could be optimized, I just don't know what I can trim. This time my mysql didn't took a hit, but my php process was at 100%. Any room for improvement? <?php require_once 'db_connect.php'; $default_dir = "storage/2011/"; //declare $query = mysql_query("SELECT filename, thumbname FROM images"); $values_filename = array(); $values_thumbname = array(); $value_files = array(); $counter = 0; $file_counter = 0; $found = false; //store db info in array while ($row = mySql_fetch_array ($query)) { $values_filename[$counter] = $row[filename]; $values_thumbname[$counter] = $row[thumbname]; $counter++; } if(!($dp = opendir($default_dir))) die("Cannot open $default_dir."); //Store dir files in array while($file = readdir($dp)) { if(is_dir($file)) { continue; } else if($file != '.' && $file != '..') { $value_files[$file_counter] = $file; $file_counter++; }//end elseif }//end while closedir($dp); //process files for ( $index = 0; $file_counter > $index; $index += 1) { for ($nest_index = 0; $counter > $nest_index; $nest_index +=1) { if (($value_files[$index] == $values_filename[$nest_index]) || ($value_files[$index] == $values_thumbname[$nest_index])) { $found = true; $nest_index = $counter+1; //exit for loop } } if (!$found) echo $value_files[$index]."<br />"; //this is where the delete code goes. Just testing for now. $found = false;//reset it for next file. } ?> Quote Link to comment https://forums.phpfreaks.com/topic/228578-optimizing-is_dir/#findComment-1178578 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.