Jump to content

need help with curl script


rhock_95

Recommended Posts

Can someone explain how to use this class...it is a scraping tool that uses curl and although I am familiar with basic php I don't have a clue how to use this class...cURL is entirely new to me...

 

does this file get edited or do I need separate script that calls this class the way it is?

 

any help or insights is appreciated

<?php
/*******************************************************************************
*                                  grabber.php
*                              by voyager, 2003
*
* A class which is usefull for grabbing any information of any site over the net.
* It can retusrn a single value (text) or an array (texts) with given 'markup' 
* strings. The class uses PHP CURL functions so you need CULR installed on your
* server
********************************************************************************/

class Grabber
{
   var $content;
   var $content_array;
   var $noURL; //this is the boolean which will mark if to open the URL or not
   var $text; //the text without starting and ending unneeded parts
   var $searchar; // the searched array
   var $searchtxt; //the searched text

   // The constructor opens an URL and writes it in give file and dir
   // if 4th argument is given it check if this file exists already 
   // when is the last modification and if it is older, opens the URL,
   // else opens the file. If $ifmod=0 it always opens the URL
   //it defaults to 24 hours
   function Grabber($url,$tmpdir='tmp/',$tmpfile='tmp.txt',$ifmod=86400)
   {
         $this->content="";      	     	     	     
         $ch = curl_init ();                   
         curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
     curl_setopt ($ch, CURLOPT_URL, $url);		    
     curl_setopt ($ch, CURLOPT_TIMEOUT, 60);	
     $useragent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040206 Firefox/0.8";	    	     
     curl_setopt($ch,CURLOPT_USERAGENT,$useragent);
     $this->content = curl_exec ($ch);     
     curl_close ($ch);         
   }   

   //this grabs only a piece of text
   function grab_unit($start,$end)
   {
       //cut from start to end
      $this->text=substr($this->content,strpos($this->content,$start)+strlen($start)+1);
      $this->text=substr($this->text,0,strpos($this->text,$end));
      $this->searchtxt=$this->text;
   }

  
   //it gets start delimeter, end delimeter and an array of pieces to
   //put out. Returns the array of needed infomation
   //delimstart and delimend are arround the pieces of searched data
   function grab_array($start,$delimstart,$delimend,$end)
   {
      //cut from start to end
      $this->text=substr($this->content,strpos($this->content,$start)+strlen($start)+1);
      $this->text=substr($this->text,0,strpos($this->text,$end));

      //getting out the unndeeded and pushing into the array
      $this->searchar=preg_split("@$delimstart|$delimend@",$this->text);
   }

   //the elemnts of the array arent still what we need?
   function refine_array($start,$end,$clear_html=0)
   {
      for($i=0;$i<sizeof($this->searchar);$i++)
      {
         $this->searchar[$i]=substr($this->searchar[$i],
     strpos($this->searchar[$i],$start)+strlen($start));

 if(!empty($end))
 {
    $this->searchar[$i]=substr($this->searchar[$i],
       0,strpos($this->searchar[$i],$end));    
 }

         if($clear_html)
 {
    $this->searchar[$i]=strip_tags($this->searchar[$i]);
 }
      }
   }

   //You still have some unregular data which makes everything bad?
   // REmove the trash, giving an array of it
   function remove_trash($trash)
   {
      for($i=0;$i<sizeof($trash);$i++)
      {
         for($j=0;$j<sizeof($this->searchar);$j++)
 {
    $this->searchar[$j]=str_replace($trash[$i],"",$this->searchar[$j]);
 }

 $this->searchtxt=str_replace($trash[$i],"",$this->searchtxt);
      }
   }
   
      //this function does not work with the members of the grabber.
   //it just takes start and end limits and the content - $word
   // to return whats inside. You can easy debug it with giving 1 to testvar
   function cut($start,$end,$word,$testvar=0)
   {
	$word=substr($word,strpos($word,$start)+strlen($start));
	if($testvar) die($word);
	$word=substr($word,0,strpos($word,$end));

	return $word;
   }
   
   
   function send_post($vars, $url)
{
	$strRequestBody = ""; 
	while (list($key, $val) = each($ascVarStream)) 
	{ 
		if($strRequestBody != "") 
			$strRequestBody.= "&"; 
		$strRequestBody.= $key."=".$val; 
	} 

	$ch = curl_init();
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
	curl_setopt ($ch, CURLOPT_URL, $strURL);
    curl_setopt ($ch, CURLOPT_POST, $strRequestBody);
    curl_setopt ($ch, CURLOPT_POSTFIELDS, $strRequestBody);
	$return_string = curl_exec ($ch);
    curl_close ($ch);

	if ($return_string=="") {
		$message="Error: Could not post to remote system.";
		return $message;
	}
	return $return_string;			

} // End function
}
?>

Link to comment
Share on other sites

here's a function to grab anything off the internet:

file_get_contents()

 

cURL is an extra library you can compile with php using the libcurl library

http://curl.haxx.se/

http://us.php.net/curl-setopt

 

basically once you compile php with curl, which php 5 defaults already compilied with it u just have to un-semicolon it to activate it...

 

once u do this you have a whole set of curl() functions available to use, curl is used for accessing and imitating HTTP requests, FTP and a whole bunch more, a common HTTP request used with it is to HTTP POST to websites, fill out forms automatically ..etc....

 

it is not a class, but a set of functions in php once u enable it, just like GD or any other library

you're above post is indeed a class someone made using curl functions and other stuff, it is no way "universal" to grab things online, sure it might work for most sites, but usually HTTP requests and algorothims are specific for each website, for example to fill out the form to sign up a new email at yahoo is different from doing a POST request to accessing paypal services or logging in

 

bla bla bla, use live http headers, wireshark to investigate how ur client pc interacts with the server pc, or the remote script, then mimic it with curl, don't know how to use curl? read the docs..see examples

 

don't ask how to use it all though, ask how to do a specific thing..

good luck

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.