Jump to content

stoping dir listing and later continuing


sangoku

Recommended Posts

Hy i am making a script which should go trough a directory files and do some stuff with it, but the script has an execution time limit. i want to be able to continue where i started, how can i tell the file handler from which file to start the directory reading?????

Link to comment
Share on other sites

ignace's solution should allow the script to run until completion. But, if this is on a hosted serer your host may have other means in place to limit execution time and/or the host may not take kindly to you running memory intensitive scripts for long periods.

 

If that solution does not work for you, here is another option which I have used in processing a lot of files (e.g. scanning my mp3 collection to read the metadata of every file). I created a two part process. The first step was to create a script to read all the folders and subfolders in the root of the music directory and put them into an array. At the end I would run a single query to create a scan queue. The second step was a script which would get the next folder in the queue and read/process the files. I used AJAX on the "processing" page to keep calling the script until the process was complete This also allowed me to create a progress bar. If you have a large number of files in a single directory you could still do something similar to have x number of files processed on each call to the script.

Link to comment
Share on other sites

I only need the mean how to store the file pointer.... I am comparing the file names against a DB and making some manipulations against it. I ONLY need the way to store the current location nothing more nothing less

 

The only way i can think of is that i store the whole dir content into  array and then loop trough it and store where i stooped..... any other solutions??????????

 

And i have already written the rest about manipulation and other stuff loping trough files bla bla bla ... that is not difficult i only don't want to have to store such GIANT arrays into sessions.... because the server limits.... is there a  way to store the file pointer????

Link to comment
Share on other sites

This does assume your script will be finished reading all directories without hitting the memory_limit or the time_limit

 

Yes. I was able to read several thousand folders and add tem to the queue.

 

I only need the mean how to store the file pointer.... I am comparing the file names against a DB and making some manipulations against it. I ONLY need the way to store the current location nothing more nothing less

 

The only way i can think of is that i store the whole dir content into  array and then loop trough it and store where i stooped..... any other solutions??????????

 

If you are processing against an array, I suppose you could iterate through the array using something like this:

foreach($filesList as $idx => $value)
{
  $_SESSION['idx'] = $idx;
  //Do something with the value
}

 

But, that is terribly inefficienet since you have to rebuild the array each time the page loads. Plus, it is problematic if any folders change between page loads. You could probably avoid that by storing the folder path in the session and doing an array_search to find the last record processed.

 

I have a feeling you may be making this more complicated than it needs to be. Without nowing exactly what "manipulations" you are doing I can't say for sure, but it's possible you could run a single query using the array of filenames.

Link to comment
Share on other sites

nope im not XD

 

The thing is this I am making a cleanup script for a frend which has a forum which has GIANT attachments masses. But his forum engine has leaks and sometimes when he deletes stuff has a crash ect... files get out of sync and he has now mass of files that are not in his DB he wants me to move them into a folder or delete them now i have  his att folder and i loop trough it and see if the file is listed in the DB and if not i move/delete it. the thing is i dont want to load the server so i installed a sleep function and low priority select query which results in giant script execution times... now i need a way to store the point where i stooped the comparison when the execution time ends...

 

I could load the file names in a .ini file and on load parse it.... that would be faster i think but i am not sure how i should store the whole folder in it... this one has over 4TB of attachments in it.... I am thinking is it possible to open a file and then go to the next one? 

 

Or i could use stream_get_contents to store the remaining files in  the file....

Link to comment
Share on other sites

Ok, if I understand you correctly you are trying to identify all the files which are no longer associated with existing forum posts. I am guessing that you are doing a db query against each file individually to see if the file is still listed in the db. If so, runiing many individual queries is very inefficient. Are you also generating an array of all the physical files each time? There is definitely a more efficient method. Here is what comes to mind:

 

1. Get an array of ALL the physical files on the server using glob()

2. Run a single query to get ALL the file names that are in the database

3. Use the db results to generate an array of valid files

4. Use array_diff() to generate an array of ALL the files that do not exist in the database

5. Run whatever process you need to against the values in the list of invalid file. If this list is too long to complete in a single run, then add the array of invalid files to a temporary table where you can process x number of records at a time.

 

There is no need to manually check each file individually. Example code:

<?php
  
//Get array of ALL files on the server
$filesOnServer = glob("path/to/files/*.*");
  
//Create array of ALL files in the DB
$filesInDB = array();
$query = "SELECT filename FROM post_files";
$result = mysql_query($query) or die(mysql_error());
while($record = mysql_fetch_assoc($result))
{
    $filesInDB[] = $record['filename'];
}
  
//Create array of ALL files that exist on server
//but do not exist in DB
$filesNotInDB = array_diff($filesOnServer, $filesInDB);
  
?>

Link to comment
Share on other sites

nice idea but I have something more advanced in mind here is my current version of the file, the only problem is now to sore the so far read data.... and i think i should store the data of the scandir somwhere to... iam thinking of the ini file as a tem storage.... what do zou think??? ye i know the code is pretty messy....

 

<?php
    /**
    * @copyright  at ©sinisaculic@gmail.com, all rights reserverd 2009-2010
    * @author Siniša Čulić
    * @version 1.0
    * @created 31-mar-2010 13:51:17
    */

    //////////change this pasword to your personal one////
    //////////change this pasword to your personal one////
    //////////change this pasword to your personal one////
    //////////change this pasword to your personal one////

    $password = 'password1';

    //////////change this pasword to your personal one////
    //////////change this pasword to your personal one////
    //////////change this pasword to your personal one////
    //////////change this pasword to your personal one////
    /**
    * 
    */


    $comparingValues = array(); 
    $counter;



    if ($password =='password'){
        echo '<div class="alert">change the pasword!!!!!!</div>';
        return false;
    }
    session_name('cleanup');

    if(isset($_SESSION['cleanup_in_progress'])){

    }else{
        if(!isset( $_POST['DBname'],$_POST['host'],$_POST['DBusername'],$_POST['DBpassword'],$_POST['directory'],$_POST['pagePassword'],$_POST['exectutionTime'],$_POST['delayTime'],$_POST['attachment'])){
            $_SESSION['cleanup_in_progress'] = true;
            $_SESSION['start_time'] = time();
            $_SESSION['execution_time'] = $_POST['delayTime'];
            $dir=$_POST['directory'];
            if($_POST['mesurment'] =='minutes'){
                $time = int($_POST['exectutionTime']) * 60;   
            }else{
                $time = int($_POST['exectutionTime']);
            }
            $partitoning = $_POST['partitioning']; 
            $_SESSION['partitioning'] = $partitoning;
            set_time_limit($time);
            $dir = $_POST['directory'];

            if(is_dir($dir)){
                $dir_content = scandir($dir);

                $link = mysql_connect($_POST['host'],$_POST['DBusername'],$_POST['DBpassword']) or die('<div class="alert">cant connect to mysql!!!</div>');
                mysql_select_db($_POST['DBname'],$link) or die ('<div class="alert">incorect DB name</div>');

                if ($_POST['attachment'] !==''){
                    $name =  $_POST['attachment'] ;
                }else{
                    $name  = 'attachment';
                }

                foreach($dir_content as $file){
                    if(is_file($file)){
                        if(bufferQuerry($file)){
                            /// need to put a storage way in here !!!!!!!!
                            /// need to put a storage way in here !!!!!!!!
                            /// need to put a storage way in here !!!!!!!!
                            /// need to put a storage way in here !!!!!!!!
                            /// need to put a storage way in here !!!!!!!!
                            /// need to put a storage way in here !!!!!!!!
                            /// need to put a storage way in here !!!!!!!!
                            /// need to put a storage way in here !!!!!!!!
                        }else{
                            $_SESSION['eror'] = '<div class="alert">there was an folder reading eror</div>';
                            echo'<div class="alert">there was an folder reading eror</div>';
                            die;
                        }
                    }

                }


            }
        }
    }

    function bufferQuerry($file){
        global $comparingValues;
        global $counter;
        global $partitoning;

        if ($partitoning >= $counter){
            $sql ="SELECT `filename`  FROM  `attachment`  WHERE";
            foreach($comparingValues as $name){
                $sql.= " `filename` = $name or ";
            }
            trim($sql,' or');
            $result =  mysql_query($sql,$link);
            if ($result){
               return heckResultsOfTheQuery($result);
            }else return false;
        } else {
            $counter++ ;
            $comparingValues[] = $file;
            return true;
        }
    }
    function checkResultsOfTheQuery($result){
        global $comparingValues;
        
        if(!$result) return false;
        
        foreach (mysql_fetch_array($result) as $row){
            $result[] = $row[0];
        }
        
        $diference = array_diff($result,$comparingValues);
        if($diference){
            return MissingFIle($diference);
        } else return true;
        $comparingValues = array();

    }

    function MissingFIle($files){

        $option = $_POST['delete'];
        $moveFolder = $_POST['movingFolder'];
        $originalFolder = $_POST['directory'];

        if(!is_dir($moveFolder)){
            $_SESSION['eror'] ='<div class="alert">The specified rouge folder is not a folder</div>';
            echo '<div class="alert">The specified rouge folder is not a folder</div>';
            die;
        }

        if(!is_writable($moveFolder)){
            $_SESSION['eror'] ='<div class="alert">cant write to the rouge files folder</div>';
            echo '<div class="alert">cant write to the rouge files folder</div>';
            die;
        }
        if($option == 'move'){
            foreach ($files as $filename){
                rename($originalFolder.'/'.$filename, $moveFolder.'/'.$filename); 
            }    
        }else{
            foreach ($files as $filename){
                unlink($originalFolder.'/'. $filename);
            }
        }


    }

?>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
    <head>
        <title>Parameter setings</title>
        <link rel="stylesheet" type="text/css" href="my.css">
    </head>
    <body>       
        <div class="data" <?php 
                if(isset($_SESSION['cleanup_in_progress'])){
                    echo  "style='display:none'";
                }
            ?>
            >
            <form action="" method="post"  >

                <div class="database data">

                    Host name <input type="text" name="host" value="enter host name" size="45" ><br/>

                    Database name <input type="text" name="DBname" value="Enter the DB name" size="45" ><br/>
                    Database username <input type="text" name="DBusername" value="Enteher here your DB username" size="45" ><br/>
                    Database password <input type="password" name="DBpassword"  size="45" ><br/>
                    I you changed the name of the table attachment then put its name here:<br/>
                    <input type="text" size="10" name="attachment" value=""><br/>
                    Else LEAVE the field empty!!!<br/><br/>
                    Upload Directorry<br/> <input type="text" name="directory" value="Enter here the directory you want to be cleaned" size="45" ><br/>
                </div>
                Database Page Paswoord <br/><input type="password" class="pagePassword" name="pagePassword"  size="45" ><br/>
                <div class="options">
                    Script execution time <input type="text" name="exectutionTime" value="10" size="10" ><br/>
                    <input type="radio"  name="mesurment" value="seconds" checked="checked">in seconds<br/> 
                    <input type="radio" name="mesurment" value="minutes">in minutes<br/> 
                    Script delay time <input type="text" name="delayTime" value="in miliseconds" size="45" ><br/>
                    select if you want to delete or just move the invalid attachments to a folder<br/>
                    <input type="radio" name="delete" value="delete" checked="checked">delete 
                    <input type="radio" name="delete" value="move"> move <br/>
                    folder for rougue files: <input type="text" name="movingFolder"><br/>
                    input the SQL partitoning size<input type="text" name="partitioning" value="10"><br/>
                    this value mesures how many files at once will be compared against the Database 
                    <br/>
                    <br/>
                </div>
                <input type="submit" name="submit" value="exectute the script!" class="bt_register" />
            </form>
        </div>

        <div class="status" <?php 
                if(!isset($_SESSION['cleanup_in_progress'])){
                    echo  "style='display:none'";
                }
            ?>
            >
            Remaining time of the cleanup: <br/>
            Duplicate files so far found are:<br/>


        </div>
    </body>
</html>
you have any ideas

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.