Jump to content

Xpath help


Omzy

Recommended Posts

Basically I generated an XPath query that finds and outputs a <P> tag from a given page.

 

The problem I am having is that this <P> tag contains <BR> tags within it's content. For example:

 

<p>

100 New Drive

<br>

New Town

<br>

Manchester

<br>

M1 AAA

</p>

 

 
$address = $xpath->evaluate("/html/body/table/tr[5]/td[4]/p");

echo $address->item(0)->nodeValue;

 

This outputs:

 

100 New DriveNew TownManchesterM1 AAA

 

Ideally I want the <P> tag to be created into an array which is split up upon each <BR> tag. I can then put the data from this array into their own fields in the database.

 

Anybody got any suggestions on how to do this?

Link to comment
Share on other sites

Given your current code, the following should do what you're wanting or at least point you in the general direction.

 

$texts = $xpath->query('text()', $address->item(0));
foreach ($texts as $text) {
$addr[] = trim($text->wholeText);
}

print_r($addr);

 

The code should be pretty self-explanatory but basically it asks for the text nodes belonging to the paragraph and throws them onto the $addr array for later use.  The output, if all goes to plan, should be:

 

Array
(
    [0] => 100 New Drive
    [1] => New Town
    [2] => Manchester
    [3] => M1 AAA
)

Link to comment
Share on other sites

salathe,

 

Perhaps you can help me out with my final xpath query -

 

I've created a scrape script which fetches all links on a page:

 

$dom = new DOMDocument();

@$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$links = $xpath->query("//a[@class='listinglink']");

$i=0;

foreach($links as $item)
{
$href = $links->item($i);
$url = $href->getAttribute('href');
echo '<a href="'.$url.'">'.$url.'</a><br/>';
$i++
}

 

I now need to extend this further - it needs to go in to each link and perform the xpath query from my original post. Do you have any idea how I can do this?

Link to comment
Share on other sites

Your original post was accessing a paragraph's text, your latest a series of anchors. Without more details, help will only be guess-work as the two do not appear to correlate.

 

Give a sample of the HTML that you're accessing and what you want to do with it more precisely.

 

P.S. You're using the foreach loop in a strange way, it could be changed to foreach($links as $href) saving the need for the first and last lines within the loop.

Link to comment
Share on other sites

Hi Salathe,

 

Basically the most recent code I posted is meant to grab all links (with a class value of 'listinglink') from a given page.

 

I now want to run the code from my original post upon each of those links. So basically it's going to go into each of those links, find the required <P> tag and output it's data underneath the link.

 

So a sample output would be:

 

Link 1

P tag content

 

Link 2

P tag content

 

...and so on.

 

I tried using another curl() within the foreach loop but it doesn't seem to work.

 

P.S. thanks for the helpful tip on the foreach loop!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.