Jump to content

Recommended Posts

Ok this is my first post on the PHP Freaks forum but I've been around this place a lot and have read tons of articles, ect... I'm not new to PHP and I understand programming very well its what I do for a living most of the time so no need to give simple answers.

 

Anyway I've got a problem that I cannot find a solution for anywhere, I've searched the PHP documentations, the PHP Freaks forum, website, and probably about 50 others in the past two days alone and the problem lies with my cURL timing out and completely killing my script.

 

A brief overview of what I'm doing with it, first off I'm running PHP Version 4.3.3 and I've got a class set up to pharse rss feeds, now I've been using this class for a long time and it works great and never had a problem until now, the reason I think is because of the amount of feeds I'm trying to phrase through.

 

I'm going through anywhere from 300 to 1000 or more feeds to verify that they actually have at least one single post to them, if they don't then I get rid of the feed URL and if they do then I keep that URL, anyway it seems like on certain feed urls that the page never loads, a response is never sent back by the calling machine so I said well I'll just set the cURL timeout option to say 20 seconds and if it times out then the script will continue to fire but the script doesn't continue to fire, when it times out it completely kills the script, wouldn't common sense say that a fatal error is causing this or that if cURL times out it should close and the script should continue on its marry way?, Basically I'm not getting any errors what so ever its just stopping the script all together after a timeout, here is the code I'm using.

 

class xmlParser 
  {
     var $title;
     var $link;
     var $description;
     function xmlParser ($aa)
     {
         foreach ($aa as $k => $v)
             $this->$k = $aa[$k];
     }
  }
  
  function readDatabase($filename)
  {
     $ch = curl_init();
     $timeout = 30; // set to zero for no timeout
     curl_setopt($ch, CURLOPT_URL, $filename);
     curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
     curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
     curl_setopt($ch, CURLOPT_LOW_SPEED_TIME, 15);
     curl_setopt($ch, CURLFTPSSL_NONE, true);
     $data = curl_exec($ch);
     curl_close($ch);

     $parser = xml_parser_create();
     xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
     xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
     xml_parse_into_struct($parser, $data, $values, $tags);
     xml_parser_free($parser);
     
     if($tags)
     {
       foreach ($tags as $key => $val) {
           if ($key == "item") {
               $molranges = $val;
               for ($i = 0; $i < count($molranges); $i += 2) {
                       $offset = $molranges[$i] + 1;
                   $len = $molranges[$i + 1] - $offset;
                   $tdb[] = parseMol(array_slice($values, $offset, $len));
               }
           } else {
               continue;
           }
       }
     }
     return $tdb;
  }
  
  function parseMol($mvalues)
  {
     for ($i = 0; $i < count($mvalues); $i++) {
         $mol[$mvalues[$i]["tag"]] = $mvalues[$i]["value"];
     }
     return new xmlParser($mol);
  }

 

So to sum it up if I try to access a URL that takes forever to send a response back the cURL object reaches its timeout and kills the script, what I want it to do is once the timeout is reached don't completely kill the script but return a false value to the calling function and move on to verify the next URL.

 

Any advice, suggestions, or points of view at this point would be a huge help.

 

Thanks in advance.

I would try using exceptions to catch the error / "exit" when the curl operation fails.

 

http://www.php.net/exceptions

 

You could also separate the code that checks the feed and inserts / deletes from the db into a file that can be executed independently...then have a controller script that forks off additional, background, processes to do the work...it would probably be faster, and it wouldn't kill the overall script.

 

Cacti does this with it's snmp polling functionality.

I would try using exceptions to catch the error / "exit" when the curl operation fails.

 

http://www.php.net/exceptions

 

You could also separate the code that checks the feed and inserts / deletes from the db into a file that can be executed independently...then have a controller script that forks off additional, background, processes to do the work...it would probably be faster, and it wouldn't kill the overall script.

 

Cacti does this with it's snmp polling functionality.

 

Unfortunately the try/catch sequence is only for PHP 5 and my host runs PHP 4.3.3 I already tried that one out even knowing its not supported to no avail.

 

toplay,

 

I'll give it a shot with the CURLOPT_TIMEOUT as well as a combination of others.

 

Basically I've managed to somewhat figure out the problem more, basically from what I can tell is that the script dies when the curl_exec(); fires after which it doesn't make it to the next lines of code, its as if the curl_exec function causes a fatal error which kills the script completely and prevents any further execution but the really tricky thing is that it doesn't even return any kind of error to me I've used CURLOPT_FAILONERROR with some if statements to try and catch the any errors but again none are reported.

 

Its a tough one because it works fine about 70% of the time with huge lists of URLs to check and it only seems to stop once it comes to an invalid URL or a URL in which the server does not send back a response from, then it hangs, I've found a few of the URLs which cause it to kill the script and tried visiting them as normal in the web browser but the page never seems to load, some even load much longer then the standard 30 second browser timeout.

 

Its only on those non-loading URLs which cause the script to fail and not pass the function curl_exec(), basically my overall goal is to just simply verify the feeds that the URLs point to, after I grab a list of RSS feeds I send out the bot to check to see if the URL is valid and to see if the feed has any posts in it, if the feed doesn't have any posts or doesn't load then I take it out of my array and keep the good ones.

 

I guess I'll have to experiment with some other ideas and see if I can troubleshoot this thing some more.

 

P.S. I also just tried to break up the script so instead of grabbing the URLs and trying to verify them all in one process, It grabs the URLs (Just fine) then once the script stops I click a new validate button and run the script from a fresh start, I did this because I though maybe just maybe there is a setting in the php.ini file that is causing the problem but since my host doesn't give me access to the php.ini file I can't exactly check it.

  • 2 years later...
  • 2 years later...
  • 8 months later...
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.