Jump to content

help on making my own crawler


webtuto

Recommended Posts

hi , i want to make a crawler that grab IMAGES LINKS from another website

so i started like that

$site = "http://www.zik4.com/";
$file = file_get_contents($site);

 

and i dont know how to extract just IMAGES URLS(using regex but...) and echo them on my page

any idea on how to search on a source code for a word and echo it ?

thanks in advance

Link to comment
Share on other sites

Are you referring to grabbing images from a website? If so, try this code below.

 

if ($handle = opendir('url/')) {
while (false !== ($file = readdir($handle))) {
if ($file != "." && $file != ".." ) {
if (strpos($file, '.jpg',1)||strpos($file, '.gif',1) ) {
print"$file<br />";
}
}
}
closedir($handle);
}

Link to comment
Share on other sites

@THE EAGLE : i did that but where do i have to put the link to the website where the script m ust grab photos ???

 

i put the link here like that ->

$site = "http://www.zik4.com/";
if ($handle = opendir($site)) {

 

but it return this error

Warning: opendir(http://www.zik4.com/) [function.opendir]: failed to open dir: not implemented in C:\wamp\www\bot\index.php on line 3

Link to comment
Share on other sites

Try adding this into the code

 

<?
$f = fopen("http://www.domainname.com","r");
$inputStream = fread($f,65535);
fclose($f);
if (preg_match_all("/<a.*? href="(.*?)".*?>(.*?)</a>/i",$inputStream,$matches)) {
$matches= strip_tags($matches);
print_r($matches);
}
?> 

Link to comment
Share on other sites

i used the codeeu just game and and it gaves me this error

--> Parse error: parse error in C:\wamp\www\bot\index.php on line 5

line 5 is ->

if (preg_match_all("/<a.*? href="(.*?)".*?>(.*?)</a>/i",$inputStream,$matches)) {

ps : i copied just the latest code ugave me , i deleted the first one that uses readdir

Link to comment
Share on other sites

You dont want to be using the function file_get_contents() for this job.

To create a scraper, spider, whatever you should learn CURL. Checkout the PHP manual. I would also look ad the 'tidy' funtions to clean HTML on remote sites prior to using any regular expression pattern matching as people use all kinds of variations of HTML syntax.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.