Yakooza Posted August 13, 2009 Share Posted August 13, 2009 Hello Im currently developing a clan site on the Battle Field Heroes. I wanted to have a page where our member's stats such as kills/deaths are posted. The official BFH website has a ranking system implanted so I thought this should be possible. I searched google for help. Found a article on page scraping which gave me this code, but it did not work. <html> <head> <title>PHP Test</title> </head> <body> <?php $url = "http://www.amazon.com/exec/obidos/ASIN/1904151191/"; $filepointer = fopen($url,"r"); if($filepointer){ while(!feof($filepointer)){ $buffer = fgets($filepointer, 4096); $file .= $buffer; } fclose($filepointer); } else { die("Could not create a connection to Amazon.com"); } ?> <?php preg_match("/<b>Amazon.com\sSales\sRank:\s<\/b>\s(.*)\s/i",$file,$match); $result = $match[1]; echo $result; ?> </body> </html> The outcome of this is Amazon.com\sSales\sRank:\s<\/b>\s(.*)\s/i",$file,$match); $result = $match[1]; echo $result; ?> ... Im coming here for help. Is there any way I can do this? Btw, here is the page of the rank page http://www.battlefieldheroes.com/heroes/191233146 Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/ Share on other sites More sharing options...
oni-kun Posted August 13, 2009 Share Posted August 13, 2009 Thankfully I've had the same problem before. There's a very simple single PHP file that can parse HTML/DOM for you, you just need to include it.. http://simplehtmldom.sourceforge.net/ You can made the coding as simple as this! // Create DOM from URL or file $html = file_get_html('http://www.google.com/'); foreach($html->find('div.scores') as $html) { $item['title'] = $html->find('div.title', 0)->plaintext; $item['rank'] = $html->find('div.rank', 0)->plaintext; $html[] = $item; } Like that sorta, can find div's and output their innerhtml and everything. echo $element->src . '<br>'; Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-896935 Share on other sites More sharing options...
Yakooza Posted August 13, 2009 Author Share Posted August 13, 2009 Sorry, but Im new and did not fully understand what you are saying. Can you explain more thoroughly on how to use the code to extract data from another site and have it save the output on a page? Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-896940 Share on other sites More sharing options...
trq Posted August 13, 2009 Share Posted August 13, 2009 Page scraping is probably easier / better done client side using jQuery these days. Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-896945 Share on other sites More sharing options...
Yakooza Posted August 13, 2009 Author Share Posted August 13, 2009 This really shouldnt be complicated. All I need to do is pull some info from another site I also tried starting a new document putting this code in it and saving it as a .php. <?php $data = file_get_contents('http://www.warbeats.com/Default.aspx'); $regex = '/been working on this (.+?) day/'; preg_match($regex,$data,$match); var_dump($match); echo $match[1]; ?> It should work...but I see no output. Do I have to call onto the function or something? Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-896949 Share on other sites More sharing options...
oni-kun Posted August 13, 2009 Share Posted August 13, 2009 Oh, simply you download the DOM parser and open it up. All you need is the one php file 'simple_html_dom.php' but you can browse the examples if you wish. For example... lets say you wanted to grab google's logo. <?php include('simple_html_dom.php'); $html = file_get_html('http://www.google.com/'); $logo = $html->find('#logo'); echo $logo; That'll return the div #logo with the image. You can easily traverse to someone's stats like you wanted, and parse the content after.. There are plenty of examples here.. http://simplehtmldom.sourceforge.net/manual.htm Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-896951 Share on other sites More sharing options...
Yakooza Posted August 13, 2009 Author Share Posted August 13, 2009 Ok I tried that The outcome was Array How come? Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-896957 Share on other sites More sharing options...
trq Posted August 13, 2009 Share Posted August 13, 2009 Ok I tried that The outcome was Array How come? Because your echoing an array perhaps? Post some code. Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-897058 Share on other sites More sharing options...
RichardRotterdam Posted August 13, 2009 Share Posted August 13, 2009 Page scraping is probably easier / better done client side using jQuery these days. Just a while ago I stumbled upon phpQuery which is a port of jquery to php. That might also help http://code.google.com/p/phpquery/ However I don't think you really need a third party script. the DOMdocument class is enough to easily do this sort of task. Here is a recent thread about regex and DOMdocument which might help you: http://www.phpfreaks.com/forums/index.php/topic,264032.0.html Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-897115 Share on other sites More sharing options...
trq Posted August 13, 2009 Share Posted August 13, 2009 I still don't see the point of doing this with a server side script when jQuery alone is perfectly capable of doing this. Doing it client-side means no overhead at all on your server, the client simply makes the request to the external source themselves. Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-897214 Share on other sites More sharing options...
Yakooza Posted August 13, 2009 Author Share Posted August 13, 2009 Im hosting this on Byethost. I cant do it through the client Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-897485 Share on other sites More sharing options...
Yakooza Posted August 13, 2009 Author Share Posted August 13, 2009 I tried copying the examples exactly from the simplehtmdom parser, except I changed the directory of the include file, but none of them returned an output... Whats wrong here? Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-897520 Share on other sites More sharing options...
RichardRotterdam Posted August 13, 2009 Share Posted August 13, 2009 I still don't see the point of doing this with a server side script when jQuery alone is perfectly capable of doing this. Doing it client-side means no overhead at all on your server, the client simply makes the request to the external source themselves. Totally true, however if you are doing this serverside you could create a cron job/ scheduled task and store the data that is required locally (in a database or xml for example). After that it would be a simple task of just fetching the required data locally thus increasing performance. I heaven't tried to use jQuery yet for scraping a remote site. Wouldn't that cause a cross domain scripting implication? I wanna try that out so far I've been getting jsonp to work for cross doimain scripting but not reading a whole remote site as string. I tried copying the examples exactly from the simplehtmdom parser, except I changed the directory of the include file, but none of them returned an output... Whats wrong here? What code did you exactly use? just saying "but none of them returned an output" is a little too hard too work with for giving you help. Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-897603 Share on other sites More sharing options...
Yakooza Posted August 13, 2009 Author Share Posted August 13, 2009 These are the codes I tried <?php include_once('/simple_html_dom.php'); function scraping_slashdot() { // create HTML DOM $html = file_get_html('http://slashdot.org/'); // get article block foreach($html->find('div[id^=firehose-]') as $article) { // get title $item['title'] = trim($article->find('a.datitle', 0)->plaintext); // get body $item['body'] = trim($article->find('div.body', 0)->plaintext); $ret[] = $item; } // clean up memory $html->clear(); unset($html); return $ret; } // ----------------------------------------------------------------------------- // test it! $ret = scraping_slashdot(); foreach($ret as $v) { echo $v['title'].'<br>'; echo '<ul>'; echo '<li>'.$v['body'].'</li>'; echo '</ul>'; } ?> <?php include_once('../../simple_html_dom.php'); function scraping_IMDB($url) { // create HTML DOM $html = file_get_html($url); // get title $ret['Title'] = $html->find('title', 0)->innertext; // get rating $ret['Rating'] = $html->find('div[class="general rating"] b', 0)->innertext; // get overview foreach($html->find('div[class="info"]') as $div) { // skip user comments if($div->find('h5', 0)->innertext=='User Comments:') return $ret; $key = ''; $val = ''; foreach($div->find('*') as $node) { if ($node->tag=='h5') $key = $node->plaintext; if ($node->tag=='a' && $node->plaintext!='more') $val .= trim(str_replace("\n", '', $node->plaintext)); if ($node->tag=='text') $val .= trim(str_replace("\n", '', $node->plaintext)); } $ret[$key] = $val; } // clean up memory $html->clear(); unset($html); return $ret; } // ----------------------------------------------------------------------------- // test it! $ret = scraping_IMDB('http://imdb.com/title/tt0335266/'); foreach($ret as $k=>$v) echo '<strong>'.$k.' </strong>'.$v.'<br>'; ?> <?php include_once('../../simple_html_dom.php'); function scraping_digg() { // create HTML DOM $html = file_get_html('http://digg.com/'); // get news block foreach($html->find('div.news-summary') as $article) { // get title $item['title'] = trim($article->find('h3', 0)->plaintext); // get details $item['details'] = trim($article->find('p', 0)->plaintext); // get intro $item['diggs'] = trim($article->find('li a strong', 0)->plaintext); $ret[] = $item; } // clean up memory $html->clear(); unset($html); return $ret; } // ----------------------------------------------------------------------------- // test it! // "http://digg.com" will check user_agent header... ini_set('user_agent', 'My-Application/2.5'); $ret = scraping_digg(); foreach($ret as $v) { echo $v['title'].'<br>'; echo '<ul>'; echo '<li>'.$v['details'].'</li>'; echo '<li>Diggs: '.$v['diggs'].'</li>'; echo '</ul>'; } ?> None of them had an output. When I opened the page containing this code, it was completely blank. Im guessing its outdated? Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-897720 Share on other sites More sharing options...
RichardRotterdam Posted August 14, 2009 Share Posted August 14, 2009 The first code sniplet works fine for me. What php version are you running? and do you have your error reporting on? Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-897949 Share on other sites More sharing options...
Yakooza Posted August 15, 2009 Author Share Posted August 15, 2009 Its running 5.2.10 and Im not sure about error reporting. Im not hosting the site myself o.O Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-898601 Share on other sites More sharing options...
RichardRotterdam Posted August 15, 2009 Share Posted August 15, 2009 If for example you create an error on purpose do you a error in your browser? Check the manual for the error_reporting Also you might want to test something on a local pc first. Quote Link to comment https://forums.phpfreaks.com/topic/170022-page-scraping/#findComment-898767 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.