Jump to content

file_get_html and dom -- stuck in problem -- help needed


Recommended Posts

I am trying to fetch the html source code ... let me explain:
 
1: on the news page of the website there are headlines links.
2: go inside that links and fetch the html of that page.
3: with simple dom only fetch the article image.
4: output all the images.
 
but i am stuck in the 2nd part.
 
here is the code:
 
<?php
require('simple_html_dom.php');

$url = 'http://www.goal.com/en/news/archive/1/';
$site_url = 'http://www.goal.com/';

$html = file_get_html($url);

$links = array();

// List Title Links of on the page...........
// --------------------------------------------
foreach($html->find('.imgBox') as $tl) {
$url_inner  =  $tl->find('a', 0)->href;


// Inside of the Title Links
// --------------------------------------------

$innerpage = file_get_contents($site_url . $url_inner);

$html_innerpage = file_get_html($innerpage);

echo $html_innerpage;

}



?>

 

For step 2 all you need to do is pass  $site_url . $url_inner  to file_get_html()

$html_innerpage = file_get_html($url_inner to file_get_html); // get the article html

For step 3 you use  $tl->find()  to find the article image ( .article-image ).

For step 4 echo the image.

Edited by Ch0cu3r

Thanks for the reply but I still am confused.

here is what i did so far:

<?php
	require('simple_html_dom.php');
	
	$url = 'http://www.goal.com/en/news/archive/1/';
	$site_url = 'http://www.goal.com/';
	
	$html = file_get_html($url);
			
			$links = array();
			
			// List Title Links of on the page...........
			// --------------------------------------------
			foreach($html->find('.imgBox') as $tl) {
				$url_inner  =  $tl->find('a', 0)->href;
				
			
			// Inside of the Title Links
			// --------------------------------------------
			
			$innerpage = file_get_html($site_url . $url_inner);
			
						
				$images = $tl->find('. article-image', 0);
				$item['image'] = $images;
				
				$allimg[] = $item;
			
			}

foreach($allimg as $tl){

	echo '
	<item>
		<images>' . $tl['image'] . '</images>
	</item>
	';
}

						
?>

Sorry I meant to use  $innerpage->find('.article-image')   not   $lt->find('.article-image')

 

 

However I think step 2 actually means to get the thumbnail image shown with the headline link and not the actual image from the article of the headline? In which case the foreach loop needs to be

// get headline link in <div class="imgBox">
foreach($html->find('.imgBox') as $lt)
{
    // get the article url from the anchor tag href attribute
    $headline_link = $lt->find('a', 0)->href;

    // get the image url from the image tag src attribute
    $headline_image = $lt->find('img', 0)->src;

    $allimg[]['image'] = $headline_image;
}

Maybe you need to clarify what step 2 means.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.