Jump to content

get content of element after previous element has certain contain.


Recommended Posts

The code is from a third-party site which I wish to get certain content from.

I am wanting to search within the "b" class (there is only on in the source code!) for the <dt>item7</dt> then grab the content of the element that follows it. the

7-

 

otherwise return an empty string

 

 

$b = '
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/html">
<head>
</head>
<body>
    <div class="b">
        <dl>
            <dt>item1:</dt><dd>1</dd>
            <dt>item2:</dt><dd>2</dd>
            <dt>item3:</dt><dd>3</dd>
            <dt>item4:</dt><dd>4</dd>
            <dt>item5:</dt><dd>5</dd>
            <dt>item6:</dt><dd>6</dd>
            <dt>item7:</dt><dd>7-</dd>
            <dt>item8:</dt><dd>8</dd>
            <dt>item9:</dt>
        </dl>
    </div>
</body>
</html>
';

$b = new SimpleXMLElement($b);

echo $b->dl->dt; // echo the content of <dd> only if the previous <dt> node has the text 'item7' in it.

I have looked at the children() link but its gone completely over my head.

 

It looks like the only way I can do this is the long way and search the html code for the <dt>item7:</dt> then grab the text within the next element if the text was found.  I was just wondering if there was an easier way of doing this without all the extra code.

There's a shorter method, but whether it's "easier" is debatable.

$b->registerXPathNamespace("h", "http://www.w3.org/1999/html");
$dd = $b->xpath("//h:div[@class='b']/h:dl/h:dd[preceding-sibling::h:dt[position()=1]='item7:']")[0];

Find any DIV with class='b', then go to their DL children, then to their DD children but filter to the ones whose previous DT sibling is the value 'item7:'

If the class isn't "b" then it won't match, but the XPath query can be adjusted to suit.

> "The code is from a third-party site which I wish to get certain content from."

 

Screen-scraping from other web sites is generally a Bad Idea.

 

"I know Engineers, they love to change things!"

 

You can build a carefully crafted script that works today and then, in a couple of weeks time and for no apparent reason, it suddenly stops working and you have to drop everything, chase around and rewrite your script to work with their new web page design.  

Sure it's "fun" the first couple of times you have to do this, but it gets "old" really quickly. 

 

You should be using a more stable, published API to get any data you require. 

 

Regards,   Phill  W.

 

 

 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.