get content of element after previous element has certain contain.

jasonc · October 4, 2017

The code is from a third-party site which I wish to get certain content from.

I am wanting to search within the "b" class (there is only on in the source code!) for the <dt>item7</dt> then grab the content of the element that follows it. the

7-

otherwise return an empty string

$b = '
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/html">
<head>
</head>
<body>
    <div class="b">
        <dl>
            <dt>item1:</dt><dd>1</dd>
            <dt>item2:</dt><dd>2</dd>
            <dt>item3:</dt><dd>3</dd>
            <dt>item4:</dt><dd>4</dd>
            <dt>item5:</dt><dd>5</dd>
            <dt>item6:</dt><dd>6</dd>
            <dt>item7:</dt><dd>7-</dd>
            <dt>item8:</dt><dd>8</dd>
            <dt>item9:</dt>
        </dl>
    </div>
</body>
</html>
';

$b = new SimpleXMLElement($b);

echo $b->dl->dt; // echo the content of <dd> only if the previous <dt> node has the text 'item7' in it.

cyberRobot · October 4, 2017

Have you tried the children() method? More information can be found here:

http://php.net/manual/en/simplexmlelement.children.php

jasonc · October 4, 2017

I have looked at the children() link but its gone completely over my head.

It looks like the only way I can do this is the long way and search the html code for the <dt>item7:</dt> then grab the text within the next element if the text was found. I was just wondering if there was an easier way of doing this without all the extra code.

requinix · October 4, 2017

There's a shorter method, but whether it's "easier" is debatable.

$b->registerXPathNamespace("h", "http://www.w3.org/1999/html");
$dd = $b->xpath("//h:div[@class='b']/h:dl/h:dd[preceding-sibling::h:dt[position()=1]='item7:']")[0];

Find any DIV with class='b', then go to their DL children, then to their DD children but filter to the ones whose previous DT sibling is the value 'item7:'

If the class isn't "b" then it won't match, but the XPath query can be adjusted to suit.

Phi11W · October 5, 2017

> "The code is from a third-party site which I wish to get certain content from."

Screen-scraping from other web sites is generally a Bad Idea.

"I know Engineers, they love to change things!"

You can build a carefully crafted script that works today and then, in a couple of weeks time and for no apparent reason, it suddenly stops working and you have to drop everything, chase around and rewrite your script to work with their new web page design.

Sure it's "fun" the first couple of times you have to do this, but it gets "old" really quickly.

You should be using a more stable, published API to get any data you require.

Regards, Phill W.

Sign In

get content of element after previous element has certain contain.

Recommended Posts

jasonc

Link to comment

Share on other sites

cyberRobot

Link to comment

Share on other sites

jasonc

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

Phi11W

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information