Node Loop Puzzle


Wondering if anyone can help - I'm not entirely sure I'm getting this right. Actually, I know I'm, because it ain't working!


I've retrieved a set of pages using cURL, all working fine, and then using a foreach loop to process each page which will either have one main image, or a main image and thumbnails. I've been scratching my head for many hours trying to figure out the logic to get either the thumbnails (and then I can retrieve the bigger images) or just the one big image.


Here's the part of the loop:


if($node->nodeName == 'div' && $node->getAttribute('id') == 'bigImageWindow' && $node->getAttribute('class') == 'hasThumbnails') {
// We have more than 1 image, so use the thumbs approach
//echo "a:<br />";
$setImageSrc = "a";
} elseif($node->nodeName == 'ul' && $node->getattribute('id') == 'thumbPage0') {
//echo "b:<br />";
$setImageSrc = "b";
if($setImageSrc == 'a') {
if($node->nodeName == 'img' && $node->getAttribute('id') == 'largeImage') {
 $instr_images[] = $node->getAttribute('src');
} else {
if($node->nodeName == 'ul' && $node->getAttribute('id') == 'thumbPage0') {
 $vnodes = $node->childNodes;
 foreach($vnodes as $vnode) {
 if($vnode->nodeName == 'li') {
 $wnodes = $vnode->childNodes;
 foreach($wnodes as $wnode) {
 if($wnode->nodeName == 'a'){
 $xnodes = $wnode->childNodes;
 foreach($xnodes as $xnode){
 if($xnode->nodeName == 'img'){
 $instr_images[] = $xnode->getAttribute('src');


I know it's a bit long-winded, but the idea is simple. If the page has thumbnails, use those, else, just grab the main image. As you can see from the latest attempt, I'm trying to set a $setImageSrc to process either option a or option b.


At best, I've managed to get all the images, but a duplicated first image - this is due to it reading both the big image AND thumbnails - despite the logic that it should be doing either/or...


Any ideas?

Pencil and paper my friend. You firstly want a method "hasThumbnails()". Then another two methods "scrapeThumbnails()" and "scrapeMainImage()". That's about all we can help you with because the body of those methods is dependent on the content of the pages which you haven't supplied.


Your code would be along the lines of:



foreach($pages as $page) {
  if(hasThumbnails($page)) {
     $image = scrapeThumbnails($page);
  } else {
     $image = scrapeMainImage($page);


