webtuto Posted July 10, 2009 Share Posted July 10, 2009 hi , i want to make a crawler that grab IMAGES LINKS from another website so i started like that $site = "http://www.zik4.com/"; $file = file_get_contents($site); and i dont know how to extract just IMAGES URLS(using regex but...) and echo them on my page any idea on how to search on a source code for a word and echo it ? thanks in advance Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/ Share on other sites More sharing options...
webtuto Posted July 10, 2009 Author Share Posted July 10, 2009 TOP> Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/#findComment-872823 Share on other sites More sharing options...
WolfRage Posted July 10, 2009 Share Posted July 10, 2009 Start reading and enjoy. http://www.php.net/manual/en/function.ereg.php Don't forget to check out string functions, but this depends on how much you plan on doing this. Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/#findComment-872829 Share on other sites More sharing options...
webtuto Posted July 10, 2009 Author Share Posted July 10, 2009 thanks wolfrage , but i already saw this , and it return 0 , i think ididnt know how to implement it Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/#findComment-872830 Share on other sites More sharing options...
The Eagle Posted July 10, 2009 Share Posted July 10, 2009 Are you referring to grabbing images from a website? If so, try this code below. if ($handle = opendir('url/')) { while (false !== ($file = readdir($handle))) { if ($file != "." && $file != ".." ) { if (strpos($file, '.jpg',1)||strpos($file, '.gif',1) ) { print"$file<br />"; } } } closedir($handle); } Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/#findComment-872834 Share on other sites More sharing options...
webtuto Posted July 10, 2009 Author Share Posted July 10, 2009 @THE EAGLE : i did that but where do i have to put the link to the website where the script m ust grab photos ??? i put the link here like that -> $site = "http://www.zik4.com/"; if ($handle = opendir($site)) { but it return this error Warning: opendir(http://www.zik4.com/) [function.opendir]: failed to open dir: not implemented in C:\wamp\www\bot\index.php on line 3 Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/#findComment-872836 Share on other sites More sharing options...
The Eagle Posted July 10, 2009 Share Posted July 10, 2009 Try adding this into the code <? $f = fopen("http://www.domainname.com","r"); $inputStream = fread($f,65535); fclose($f); if (preg_match_all("/<a.*? href="(.*?)".*?>(.*?)</a>/i",$inputStream,$matches)) { $matches= strip_tags($matches); print_r($matches); } ?> Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/#findComment-872845 Share on other sites More sharing options...
webtuto Posted July 10, 2009 Author Share Posted July 10, 2009 i used the codeeu just game and and it gaves me this error --> Parse error: parse error in C:\wamp\www\bot\index.php on line 5 line 5 is -> if (preg_match_all("/<a.*? href="(.*?)".*?>(.*?)</a>/i",$inputStream,$matches)) { ps : i copied just the latest code ugave me , i deleted the first one that uses readdir Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/#findComment-872849 Share on other sites More sharing options...
The Eagle Posted July 10, 2009 Share Posted July 10, 2009 Parse errors refer to such as the href does not exist, (e.g: Make sure .*? is listed within the code) I doubt this will help, but add a semicolon ( ; ) after line 5. I'm still looking into this. Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/#findComment-872870 Share on other sites More sharing options...
webtuto Posted July 10, 2009 Author Share Posted July 10, 2009 well you can try ur self , the website is --> www.zik4.com Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/#findComment-872872 Share on other sites More sharing options...
The Eagle Posted July 10, 2009 Share Posted July 10, 2009 register_globals 1 - ZIK4.com has this enabled, I am wondering why your script isn't working. Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/#findComment-872888 Share on other sites More sharing options...
JonnoTheDev Posted July 10, 2009 Share Posted July 10, 2009 You dont want to be using the function file_get_contents() for this job. To create a scraper, spider, whatever you should learn CURL. Checkout the PHP manual. I would also look ad the 'tidy' funtions to clean HTML on remote sites prior to using any regular expression pattern matching as people use all kinds of variations of HTML syntax. Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/#findComment-872894 Share on other sites More sharing options...
seventheyejosh Posted July 10, 2009 Share Posted July 10, 2009 Have a peek at this. It should get you started in the right direction. It isn't simply a 3 line command to get a site's content. Quote Link to comment https://forums.phpfreaks.com/topic/165490-help-on-making-my-own-crawler/#findComment-872911 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.