Jump to content

file_get_html and dom -- stuck in problem -- help needed


qam47

Recommended Posts

I am trying to fetch the html source code ... let me explain:
 
1: on the news page of the website there are headlines links.
2: go inside that links and fetch the html of that page.
3: with simple dom only fetch the article image.
4: output all the images.
 
but i am stuck in the 2nd part.
 
here is the code:
 
<?php
require('simple_html_dom.php');

$url = 'http://www.goal.com/en/news/archive/1/';
$site_url = 'http://www.goal.com/';

$html = file_get_html($url);

$links = array();

// List Title Links of on the page...........
// --------------------------------------------
foreach($html->find('.imgBox') as $tl) {
$url_inner  =  $tl->find('a', 0)->href;


// Inside of the Title Links
// --------------------------------------------

$innerpage = file_get_contents($site_url . $url_inner);

$html_innerpage = file_get_html($innerpage);

echo $html_innerpage;

}



?>

 

For step 2 all you need to do is pass  $site_url . $url_inner  to file_get_html()

$html_innerpage = file_get_html($url_inner to file_get_html); // get the article html

For step 3 you use  $tl->find()  to find the article image ( .article-image ).

For step 4 echo the image.

Thanks for the reply but I still am confused.

here is what i did so far:

<?php
	require('simple_html_dom.php');
	
	$url = 'http://www.goal.com/en/news/archive/1/';
	$site_url = 'http://www.goal.com/';
	
	$html = file_get_html($url);
			
			$links = array();
			
			// List Title Links of on the page...........
			// --------------------------------------------
			foreach($html->find('.imgBox') as $tl) {
				$url_inner  =  $tl->find('a', 0)->href;
				
			
			// Inside of the Title Links
			// --------------------------------------------
			
			$innerpage = file_get_html($site_url . $url_inner);
			
						
				$images = $tl->find('. article-image', 0);
				$item['image'] = $images;
				
				$allimg[] = $item;
			
			}

foreach($allimg as $tl){

	echo '
	<item>
		<images>' . $tl['image'] . '</images>
	</item>
	';
}

						
?>

Sorry I meant to use  $innerpage->find('.article-image')   not   $lt->find('.article-image')

 

 

However I think step 2 actually means to get the thumbnail image shown with the headline link and not the actual image from the article of the headline? In which case the foreach loop needs to be

// get headline link in <div class="imgBox">
foreach($html->find('.imgBox') as $lt)
{
    // get the article url from the anchor tag href attribute
    $headline_link = $lt->find('a', 0)->href;

    // get the image url from the image tag src attribute
    $headline_image = $lt->find('img', 0)->src;

    $allimg[]['image'] = $headline_image;
}

Maybe you need to clarify what step 2 means.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.