Jump to content

Extract URL and link name from HTML page


Recommended Posts

I want to use PHP to extract all A HREF urls and the text that clicks through to the link.

 

e.g.

 

This is Google

 

With this link, I would like to extract

 

1. This is Google

2. http://google.com

 

I've looked into simplehtmldom_1_5 library but this just seems to get the URL but not the text overlay.

 

Thanks

I've looked into simplehtmldom_1_5 library but this just seems to get the URL but not the text overlay.

Try getting the link text using $element->innertext

// Find all links 
foreach($html->find('a') as $element) 
       echo 'Href: ' . $element->href . ', Link-Text: ' . $element->innertext.'<br>';

 

Try getting the link text using $element->innertext

// Find all links 
foreach($html->find('a') as $element) 
       echo 'Href: ' . $element->href . ', Link-Text: ' . $element->innertext.'<br>';

 

This simply doesn't work.

 

-> href does work

 

-> innertext does not work

The HTML is the page source of the website, Guardian.co.uk

 

I'm bascally writing a PHP CURL script to download news sites, extract the headlines and URLs, and then put them all on one page.

 

It's a convenient way to read a wide source of news and saves you from missing anything.

 

Here is a sample from the Guardan site as of now

<h1>
		
	        <a href="http://www.theguardian.com/world/2014/mar/12/mh370-malaysia-airlines-search-expands-third-possible-sighting"  class="link-text">Plane search expands after third possible sighting</a>
    </h1>

$element->innertext works fine for me.

require_once 'simple_html_dom.php';

$html = str_get_html('<h1>
<a href="http://www.theguardian.com/world/2014/mar/12/mh370-malaysia-airlines-search-expands-third-possible-sighting"  class="link-text">Plane search expands after third possible sighting</a>
</h1>');

foreach($html->find('a') as $element) 
  echo '<b>Href:</b> ' . $element->href . ', <b>Link-Text:</b> ' . $element->innertext.'<br>';

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.