Jump to content

using while loop or for each or something else....?


seany123

Recommended Posts

basically i have a database table full of URLs and i want verify them all, if false then delete

 

the code i want to incorporate is below...

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops
$data = curl_exec($ch);
curl_close($ch);
if (!$data) {
  echo "Domain could not be found";
}
else {
  preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/",$data,$matches);
  $code = end($matches[1]);
  if ($code == 200) {
    echo "Page Found";
  }
  elseif ($code == 404) {
    echo "Page Not Found";
  }
}
[code]

i know how to use the code above and make it work fine for 1 result but im struggling with make it work with the entire table...

ive tried this

[code]
    $query = mysql_query("SELECT * FROM `table`");
    while ($info = mysql_fetch_array($query)) {
    }

 

but that doesnt work, as its an infinite loop or something, the page just keeps loading... adding a break; doesnt help as it only works on the first row.

 

i know this is probably very easy but im still confused.

 

all help would be great.

Link to comment
Share on other sites

I would do the following.

 

1. Query for all the records.

2. Perform a while loop over each record and perform the validation steps that you have. As you find a record that needs to be deleted, add the record ID to an array. I'd create a function of the validation routine and have it return a true/false

3. Once the validation loop is completed, run one DELETE query on the list of records found to be invalid.

 

However, if the validation routine takes a long time to run for each record you may have a problem with the script timing out. If that is the case you might need to process the records in batches.

 

Also, I see a problem with your validation code. It ends with an If/Elseif, but no else. What if the returned value is no 200 or 404?

 

Link to comment
Share on other sites

Sample code ( I guessed on the field names)

//Function to validate a URL: returns true/false
function validURL($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops
    $data = curl_exec($ch);
    curl_close($ch);
    if (!$data) { return false; }
    else
    {
      preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/", $data, $matches);
      $code = end($matches[1]);
      if ($code != 200) { return false; }
      else { return true; }
    }
}

//Run query to get all the records
$query = "SELECT * FROM `table`";
$result = mysql_query($query) or die(mysql_error());
$deleteList = array();
while ($info = mysql_fetch_array($query))
{
    //Check each record to see if it is valid
    if(!validURL($info['url']))
    {
        $deleteList[] = $info['id'];
    }
}

//Delete the invalid records
if(count($deleteList)>0)
{
    $query = "DELETE FROM `table` WHERE `id` IN (" . implode(',', $deleteList) . ")";
    $result = mysql_query($query) or die(mysql_error());
}

Link to comment
Share on other sites

Sample code ( I guessed on the field names)

//Function to validate a URL: returns true/false
function validURL($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops
    $data = curl_exec($ch);
    curl_close($ch);
    if (!$data) { return false; }
    else
    {
      preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/", $data, $matches);
      $code = end($matches[1]);
      if ($code != 200) { return false; }
      else { return true; }
    }
}

//Run query to get all the records
$query = "SELECT * FROM `table`";
$result = mysql_query($query) or die(mysql_error());
$deleteList = array();
while ($info = mysql_fetch_array($query))
{
    //Check each record to see if it is valid
    if(!validURL($info['url']))
    {
        $deleteList[] = $info['id'];
    }
}

//Delete the invalid records
if(count($deleteList)>0)
{
    $query = "DELETE FROM `table` WHERE `id` IN (" . implode(',', $deleteList) . ")";
    $result = mysql_query($query) or die(mysql_error());
}

 

thankyou, ill give that a try... the reason i left it as elseif rather than else is because if the page is not 200 or 404 then it would be best to leave it in the database to be dealt with manually. also i have realised that what ive been trying has worked but keeps being cut off with the Maximum execution time of 60 seconds exceeded.

Link to comment
Share on other sites

the reason i left it as elseif rather than else is because if the page is not 200 or 404 then it would be best to leave it in the database to be dealt with manually.

 

Then use this logic

      if ($code == 404) { return false; }
      else { return true; }

 

also i have realised that what ive been trying has worked but keeps being cut off with the Maximum execution time of 60 seconds exceeded.

 

Then you can run these in batches. Add a column to the table named something like "processed" as a tiny int and set the default value to 0. Then in the query to grab the records use a LIMIT on the query to only get a certain number of records (say 10) AND include a WHERE clause on the query to only get those records were "processed" = 0.

 

Now when you process the records you will need to put the IDs into one of two arrays: a delete array and a save array. After you are done processing that batch run the delete query same as before, but then also run an update query on the records in the saved array to change the "processed" field to a 1. Then just keep running the page until you have processed all the records. You can put a header() at the end of the script or use a meta tag to auto refresh the page after it loads.

 

<?php

//Functin to validate a URL: returns true/false
function validURL($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops
    $data = curl_exec($ch);
    curl_close($ch);
    if (!$data) { return false; }
    else
    {
      preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/", $data, $matches);
      $code = end($matches[1]);
      if ($code == 404) { return false; }
      else { return true; }
    }
}

//Run query to get all the records
$query = "SELECT * FROM `table` WHERE `processed` = 0 LIMIT 10";
$result = mysql_query($query) or die(mysql_error());
if(mysql_num_rows($result)==0)
{
    echo "All records have been processed.";
    exit();
}
$deleteList = array();
$updateList = array();
while ($info = mysql_fetch_array($query))
{
    //Check each record to see if it is valid
    if(!validURL($info['url']))
    {
        $deleteList[] = $info['id'];
    }
    else
    {
        $updateList[] = $info['id'];
    }
}

//Delete the invalid records
if(count($deleteList)>0)
{
    $query = "DELETE FROM `table` WHERE `id` IN (" . implode(',', $deleteList) . ")";
    $result = mysql_query($query) or die(mysql_error());
}
//Update the records to save by setting the processed value
if(count($deleteList)>0)
{
    $query = "UPDATE `table` SET `processed` = 1 WHERE `id` (" . implode(',', $deleteList) . ")";
    $result = mysql_query($query) or die(mysql_error());
}
?>

Link to comment
Share on other sites

thats a great idea thankyou, couple of questions... when you say put a header()... what do you mean? i dont want to put meta tags because what if it refreshes to early eg. before the query has run completely?

 

By header, I mean the PHP header function: http://php.net/manual/en/function.header.php. You can use it to, among many other things, redirect the script to another page. In this case you would recursively call the same script over and over. The downside of this is that there would be no visual display in the browser. And, on second thought, I'm not sure that would prevent the script from timing out since the PHP code is being executed recursively.

 

Besides, I think a META tag in the HTML is a better option anyway. You don't have to worry about the page refreshing too early. You wouldn't output the HTML until the end of the script (when that batch is done executing), so it couldn't refresh until the HTML is output - which would be after the current batch is processed. You could also add some other text to the page on each iteration to display how many records are left to be processed. That way you have a visual indicator as to the progress. That is kind of a quick and sloppy solution. If this was something I'd be doing all th etime, I'd probably do the same thing - but with AJAX. But for a one-time or rarely used operation a META refresh should be fine.

Link to comment
Share on other sites

i have this working now pretty well, the 1 last thing i was hoping to incorporate is the fact that i have like 20 different tables that need processing... whereas this script only does 1 table.. is is possible to make it do more than 1 table?

Link to comment
Share on other sites

i tried doing this:

 

    $query = mysql_query("SELECT * FROM `table1`, `table2`  WHERE `processed`=0 LIMIT 20");
    
    while($info = mysql_fetch_assoc($query))
    {

but it threw up this error:

 

Warning: mysql_fetch_assoc() expects parameter 1 to be resource

Link to comment
Share on other sites

You can merge records from multiple tables assuming the structure is the same but . . . you would have no way to to the Delete or Update queries since you wouldn't know which table to run the queries against for each record. I'd implement some logic that sees if there are any unprocessed records in table one. If so, process some records against that table. If not, try the same for the next record.

 

So, modify the current queries you have to use a variable for the table name and use something like this at the top of the script.

$tableList = array('table1', 'table2', 'table3', 'table4');

foreach($tableList as $tableName)
{
    $query = "SELECT * FROM `{$tableName}` WHERE `processed`=0 LIMIT 1";
    $result = mysql_query($query) or die(mysql_error());
    if(mysql_num_rows($reesult)>0)
    {
        //There are records in this table, exit the loop and use $tableName
        break;
    }
}
//Use $tableName in the processing code you already have

 

It means you will be running multiple queries as you get to the later tables, but it's the easiest solution I can think of

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.