Extracting the Anchor Text from the RSS...

natasha_thomas · May 29, 2011

Folks,

I tired all my PHP skills to extract domain name strings from a RSS Feed and put each domain name as an Array element, but all in vain:

Here is the RSS:

http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php

What i want to extract:

Do you see a list of domain names, which are Anchored, all i need is to extract these domain names llik "abc.co uk" (observe there is a space between .co and uk, which can be removed with str_replace())

Here is my first try: (Using SimpleHTMLDomParser)

require_once('simple_html_dom.php');

$html = file_get_html('http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php');

$domains = $html->find('div[class="entry"] a', 0);

foreach($domains as $dom)
{
    
    
    echo str_replace(' ', '.', $dom->plaintext);
} 

$html->clear();
unset($html);

Here is my another try with DOM Document:

$scrapeurl = 'http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php';         

$keywords = file_get_contents($scrapeurl);

$keywords = json_decode($keywords);

foreach( $keywords->responseData->results as $keyword) 
{    
    echo str_replace("...",".",$keyword->title).'<br/>';
   
    }

In both the cases, DOM document is created but it seems the Document has all information except the Domain names i want to extract.

Please help me out to extract the doamin names.

Cheers

Sign In

Extracting the Anchor Text from the RSS...

Recommended Posts

natasha_thomas

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information