For an exercise I have to crawl some eBay pages and extract product information and metadata.
I am bloody new to PHP, this is my first try.
I am using the Simple HTML DOM parser class from here as a great start:
http://simplehtmldom.sourceforge.net/
I can open a single product collection just fine:
$html = file_get_html ( 'http://www.ebay.com/cln/linda*s***stuff/Red-Carpet-Ready-Grammy-Inspired-Style/76271969013' );
but to get all possible collections I'd need to URL like this:
$html = file_get_html ( 'http://www.ebay.com/cln#{"category":{"id":1,"text":"Collectibles"}}' );
This doesn't work. For some reason the wrong page is loaded. It's always
http://www.ebay.com/cln#
Could be a problem with the active eBay pages or something else. I can't figure it out.
Doesn anyone have a better idea how to solve this problem? I am running out of ideas here..
Any tips would be highly appreciated!
Cheers, End
Full test code below:
<?php
include_once 'simple_html_dom.php';
/* $curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://www.ebay.com/cln#{"category":{"id":20091}}');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
$str = curl_exec($curl);
curl_close($curl);
$html = str_get_html($str); */
$html = file_get_html ( 'http://www.ebay.com/cln/linda*s***stuff/Red-Carpet-Ready-Grammy-Inspired-Style/76271969013' );
// Looking for the big class and scraping image, title and other metadata
foreach ( $html->find ( 'div[class="thumb big bigL"]' ) as $bigclass ) {
foreach ( $bigclass->find ( 'img' ) as $bigimage ) {
}
;
foreach ( $bigclass->find ( 'div[class=itemPrice]' ) as $bigprice ) {
}
;
foreach ( $bigclass->find ( 'div[class=soldBy]' ) as $bigseller ) {
}
;
echo $bigimage->alt . "<br/>" . $bigimage . "<br />" . $bigprice . "<br/>" . $bigseller . "<br/><br/>";
}
;
foreach ( $html->find ( 'div[class="thumb big bigR"]' ) as $bigclass1 ) {
foreach ( $bigclass1->find ( 'img' ) as $bigimage ) {
}
;
foreach ( $bigclass1->find ( 'div[class=itemPrice]' ) as $bigprice ) {
}
;
foreach ( $bigclass1->find ( 'div[class=soldBy]' ) as $bigseller ) {
}
;
echo $bigimage->alt . "<br/>" . $bigimage . "<br />" . $bigprice . "<br/>" . $bigseller . "<br/><br/>";
}
;
// Looking for the smaller class and scraping image, title and other metadata
foreach ( $html->find ( 'div[class="thumb small"]' ) as $smallclass ) {
foreach ( $smallclass->find ( 'img' ) as $smallimage ) {
}
;
foreach ( $smallclass->find ( 'div[class=itemPrice]' ) as $smallprice ) {
}
;
foreach ( $smallclass->find ( 'div[class=soldBy]' ) as $smallseller ) {
}
;
echo $smallimage->alt . "<br/>" . $smallimage . "<br />" . $smallprice . "<br/>" . $smallseller . "<br/><br/>";
}
?>
test.php
simple_html_dom.zip