Jump to content

Endeavour

New Members
  • Posts

    2
  • Joined

  • Last visited

Everything posted by Endeavour

  1. Got it to work somehow. Managed to use the AJAX URL to download the pages I need. $html = file_get_html ( 'http://www.ebay.com/cln/explorer/_ajax?page=1&ipp=16&catids=37958' ); foreach ( $html->find ( 'div[class="connection"]' ) as $collection ) { echo "found collections: ".count($collection); Problem is, the returned file from the AJAX request contains elements encoded like: <div class=\"collection\" data-collectionid=\"75336256016\"> <div class=\"header\"> Can anyone please help me to transform all the \" in the DOM object back to the normal ". Or change the ->find command to find the right element. I basically need to pick all div.class=collection and in a later step some other div.classes but for all there's the \" problem. Thanks so much!
  2. For an exercise I have to crawl some eBay pages and extract product information and metadata. I am bloody new to PHP, this is my first try. I am using the Simple HTML DOM parser class from here as a great start: http://simplehtmldom.sourceforge.net/ I can open a single product collection just fine: $html = file_get_html ( 'http://www.ebay.com/cln/linda*s***stuff/Red-Carpet-Ready-Grammy-Inspired-Style/76271969013' ); but to get all possible collections I'd need to URL like this: $html = file_get_html ( 'http://www.ebay.com/cln#{"category":{"id":1,"text":"Collectibles"}}' ); This doesn't work. For some reason the wrong page is loaded. It's always http://www.ebay.com/cln# Could be a problem with the active eBay pages or something else. I can't figure it out. Doesn anyone have a better idea how to solve this problem? I am running out of ideas here.. Any tips would be highly appreciated! Cheers, End Full test code below: <?php include_once 'simple_html_dom.php'; /* $curl = curl_init(); curl_setopt($curl, CURLOPT_URL, 'http://www.ebay.com/cln#{"category":{"id":20091}}'); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10); $str = curl_exec($curl); curl_close($curl); $html = str_get_html($str); */ $html = file_get_html ( 'http://www.ebay.com/cln/linda*s***stuff/Red-Carpet-Ready-Grammy-Inspired-Style/76271969013' ); // Looking for the big class and scraping image, title and other metadata foreach ( $html->find ( 'div[class="thumb big bigL"]' ) as $bigclass ) { foreach ( $bigclass->find ( 'img' ) as $bigimage ) { } ; foreach ( $bigclass->find ( 'div[class=itemPrice]' ) as $bigprice ) { } ; foreach ( $bigclass->find ( 'div[class=soldBy]' ) as $bigseller ) { } ; echo $bigimage->alt . "<br/>" . $bigimage . "<br />" . $bigprice . "<br/>" . $bigseller . "<br/><br/>"; } ; foreach ( $html->find ( 'div[class="thumb big bigR"]' ) as $bigclass1 ) { foreach ( $bigclass1->find ( 'img' ) as $bigimage ) { } ; foreach ( $bigclass1->find ( 'div[class=itemPrice]' ) as $bigprice ) { } ; foreach ( $bigclass1->find ( 'div[class=soldBy]' ) as $bigseller ) { } ; echo $bigimage->alt . "<br/>" . $bigimage . "<br />" . $bigprice . "<br/>" . $bigseller . "<br/><br/>"; } ; // Looking for the smaller class and scraping image, title and other metadata foreach ( $html->find ( 'div[class="thumb small"]' ) as $smallclass ) { foreach ( $smallclass->find ( 'img' ) as $smallimage ) { } ; foreach ( $smallclass->find ( 'div[class=itemPrice]' ) as $smallprice ) { } ; foreach ( $smallclass->find ( 'div[class=soldBy]' ) as $smallseller ) { } ; echo $smallimage->alt . "<br/>" . $smallimage . "<br />" . $smallprice . "<br/>" . $smallseller . "<br/><br/>"; } ?> test.php simple_html_dom.zip
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.