seany123 Posted June 14, 2011 Share Posted June 14, 2011 basically i have a database table full of URLs and i want verify them all, if false then delete the code i want to incorporate is below... $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, ""); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_NOBODY, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops $data = curl_exec($ch); curl_close($ch); if (!$data) { echo "Domain could not be found"; } else { preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/",$data,$matches); $code = end($matches[1]); if ($code == 200) { echo "Page Found"; } elseif ($code == 404) { echo "Page Not Found"; } } [code] i know how to use the code above and make it work fine for 1 result but im struggling with make it work with the entire table... ive tried this [code] $query = mysql_query("SELECT * FROM `table`"); while ($info = mysql_fetch_array($query)) { } but that doesnt work, as its an infinite loop or something, the page just keeps loading... adding a break; doesnt help as it only works on the first row. i know this is probably very easy but im still confused. all help would be great. Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 14, 2011 Share Posted June 14, 2011 I would do the following. 1. Query for all the records. 2. Perform a while loop over each record and perform the validation steps that you have. As you find a record that needs to be deleted, add the record ID to an array. I'd create a function of the validation routine and have it return a true/false 3. Once the validation loop is completed, run one DELETE query on the list of records found to be invalid. However, if the validation routine takes a long time to run for each record you may have a problem with the script timing out. If that is the case you might need to process the records in batches. Also, I see a problem with your validation code. It ends with an If/Elseif, but no else. What if the returned value is no 200 or 404? Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 14, 2011 Share Posted June 14, 2011 Sample code ( I guessed on the field names) //Function to validate a URL: returns true/false function validURL($url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_NOBODY, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops $data = curl_exec($ch); curl_close($ch); if (!$data) { return false; } else { preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/", $data, $matches); $code = end($matches[1]); if ($code != 200) { return false; } else { return true; } } } //Run query to get all the records $query = "SELECT * FROM `table`"; $result = mysql_query($query) or die(mysql_error()); $deleteList = array(); while ($info = mysql_fetch_array($query)) { //Check each record to see if it is valid if(!validURL($info['url'])) { $deleteList[] = $info['id']; } } //Delete the invalid records if(count($deleteList)>0) { $query = "DELETE FROM `table` WHERE `id` IN (" . implode(',', $deleteList) . ")"; $result = mysql_query($query) or die(mysql_error()); } Quote Link to comment Share on other sites More sharing options...
seany123 Posted June 14, 2011 Author Share Posted June 14, 2011 Sample code ( I guessed on the field names) //Function to validate a URL: returns true/false function validURL($url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_NOBODY, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops $data = curl_exec($ch); curl_close($ch); if (!$data) { return false; } else { preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/", $data, $matches); $code = end($matches[1]); if ($code != 200) { return false; } else { return true; } } } //Run query to get all the records $query = "SELECT * FROM `table`"; $result = mysql_query($query) or die(mysql_error()); $deleteList = array(); while ($info = mysql_fetch_array($query)) { //Check each record to see if it is valid if(!validURL($info['url'])) { $deleteList[] = $info['id']; } } //Delete the invalid records if(count($deleteList)>0) { $query = "DELETE FROM `table` WHERE `id` IN (" . implode(',', $deleteList) . ")"; $result = mysql_query($query) or die(mysql_error()); } thankyou, ill give that a try... the reason i left it as elseif rather than else is because if the page is not 200 or 404 then it would be best to leave it in the database to be dealt with manually. also i have realised that what ive been trying has worked but keeps being cut off with the Maximum execution time of 60 seconds exceeded. Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 14, 2011 Share Posted June 14, 2011 the reason i left it as elseif rather than else is because if the page is not 200 or 404 then it would be best to leave it in the database to be dealt with manually. Then use this logic if ($code == 404) { return false; } else { return true; } also i have realised that what ive been trying has worked but keeps being cut off with the Maximum execution time of 60 seconds exceeded. Then you can run these in batches. Add a column to the table named something like "processed" as a tiny int and set the default value to 0. Then in the query to grab the records use a LIMIT on the query to only get a certain number of records (say 10) AND include a WHERE clause on the query to only get those records were "processed" = 0. Now when you process the records you will need to put the IDs into one of two arrays: a delete array and a save array. After you are done processing that batch run the delete query same as before, but then also run an update query on the records in the saved array to change the "processed" field to a 1. Then just keep running the page until you have processed all the records. You can put a header() at the end of the script or use a meta tag to auto refresh the page after it loads. <?php //Functin to validate a URL: returns true/false function validURL($url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_NOBODY, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops $data = curl_exec($ch); curl_close($ch); if (!$data) { return false; } else { preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/", $data, $matches); $code = end($matches[1]); if ($code == 404) { return false; } else { return true; } } } //Run query to get all the records $query = "SELECT * FROM `table` WHERE `processed` = 0 LIMIT 10"; $result = mysql_query($query) or die(mysql_error()); if(mysql_num_rows($result)==0) { echo "All records have been processed."; exit(); } $deleteList = array(); $updateList = array(); while ($info = mysql_fetch_array($query)) { //Check each record to see if it is valid if(!validURL($info['url'])) { $deleteList[] = $info['id']; } else { $updateList[] = $info['id']; } } //Delete the invalid records if(count($deleteList)>0) { $query = "DELETE FROM `table` WHERE `id` IN (" . implode(',', $deleteList) . ")"; $result = mysql_query($query) or die(mysql_error()); } //Update the records to save by setting the processed value if(count($deleteList)>0) { $query = "UPDATE `table` SET `processed` = 1 WHERE `id` (" . implode(',', $deleteList) . ")"; $result = mysql_query($query) or die(mysql_error()); } ?> Quote Link to comment Share on other sites More sharing options...
seany123 Posted June 14, 2011 Author Share Posted June 14, 2011 thats a great idea thankyou, couple of questions... when you say put a header()... what do you mean? i dont want to put meta tags because what if it refreshes to early eg. before the query has run completely? Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 14, 2011 Share Posted June 14, 2011 thats a great idea thankyou, couple of questions... when you say put a header()... what do you mean? i dont want to put meta tags because what if it refreshes to early eg. before the query has run completely? By header, I mean the PHP header function: http://php.net/manual/en/function.header.php. You can use it to, among many other things, redirect the script to another page. In this case you would recursively call the same script over and over. The downside of this is that there would be no visual display in the browser. And, on second thought, I'm not sure that would prevent the script from timing out since the PHP code is being executed recursively. Besides, I think a META tag in the HTML is a better option anyway. You don't have to worry about the page refreshing too early. You wouldn't output the HTML until the end of the script (when that batch is done executing), so it couldn't refresh until the HTML is output - which would be after the current batch is processed. You could also add some other text to the page on each iteration to display how many records are left to be processed. That way you have a visual indicator as to the progress. That is kind of a quick and sloppy solution. If this was something I'd be doing all th etime, I'd probably do the same thing - but with AJAX. But for a one-time or rarely used operation a META refresh should be fine. Quote Link to comment Share on other sites More sharing options...
seany123 Posted June 14, 2011 Author Share Posted June 14, 2011 i have this working now pretty well, the 1 last thing i was hoping to incorporate is the fact that i have like 20 different tables that need processing... whereas this script only does 1 table.. is is possible to make it do more than 1 table? Quote Link to comment Share on other sites More sharing options...
seany123 Posted June 15, 2011 Author Share Posted June 15, 2011 i tried doing this: $query = mysql_query("SELECT * FROM `table1`, `table2` WHERE `processed`=0 LIMIT 20"); while($info = mysql_fetch_assoc($query)) { but it threw up this error: Warning: mysql_fetch_assoc() expects parameter 1 to be resource Quote Link to comment Share on other sites More sharing options...
Psycho Posted June 15, 2011 Share Posted June 15, 2011 You can merge records from multiple tables assuming the structure is the same but . . . you would have no way to to the Delete or Update queries since you wouldn't know which table to run the queries against for each record. I'd implement some logic that sees if there are any unprocessed records in table one. If so, process some records against that table. If not, try the same for the next record. So, modify the current queries you have to use a variable for the table name and use something like this at the top of the script. $tableList = array('table1', 'table2', 'table3', 'table4'); foreach($tableList as $tableName) { $query = "SELECT * FROM `{$tableName}` WHERE `processed`=0 LIMIT 1"; $result = mysql_query($query) or die(mysql_error()); if(mysql_num_rows($reesult)>0) { //There are records in this table, exit the loop and use $tableName break; } } //Use $tableName in the processing code you already have It means you will be running multiple queries as you get to the later tables, but it's the easiest solution I can think of Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.