Jump to content

help on making my own crawler


webtuto

Recommended Posts

hi , i want to make a crawler that grab IMAGES LINKS from another website

so i started like that

$site = "http://www.zik4.com/";
$file = file_get_contents($site);

 

and i dont know how to extract just IMAGES URLS(using regex but...) and echo them on my page

any idea on how to search on a source code for a word and echo it ?

thanks in advance

Link to comment
https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/
Share on other sites

Are you referring to grabbing images from a website? If so, try this code below.

 

if ($handle = opendir('url/')) {
while (false !== ($file = readdir($handle))) {
if ($file != "." && $file != ".." ) {
if (strpos($file, '.jpg',1)||strpos($file, '.gif',1) ) {
print"$file<br />";
}
}
}
closedir($handle);
}

@THE EAGLE : i did that but where do i have to put the link to the website where the script m ust grab photos ???

 

i put the link here like that ->

$site = "http://www.zik4.com/";
if ($handle = opendir($site)) {

 

but it return this error

Warning: opendir(http://www.zik4.com/) [function.opendir]: failed to open dir: not implemented in C:\wamp\www\bot\index.php on line 3

Try adding this into the code

 

<?
$f = fopen("http://www.domainname.com","r");
$inputStream = fread($f,65535);
fclose($f);
if (preg_match_all("/<a.*? href="(.*?)".*?>(.*?)</a>/i",$inputStream,$matches)) {
$matches= strip_tags($matches);
print_r($matches);
}
?> 

i used the codeeu just game and and it gaves me this error

--> Parse error: parse error in C:\wamp\www\bot\index.php on line 5

line 5 is ->

if (preg_match_all("/<a.*? href="(.*?)".*?>(.*?)</a>/i",$inputStream,$matches)) {

ps : i copied just the latest code ugave me , i deleted the first one that uses readdir

You dont want to be using the function file_get_contents() for this job.

To create a scraper, spider, whatever you should learn CURL. Checkout the PHP manual. I would also look ad the 'tidy' funtions to clean HTML on remote sites prior to using any regular expression pattern matching as people use all kinds of variations of HTML syntax.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.