Working with php's DOM functions

Arty Ziff · April 12, 2011

I'm using php's DOM functions to strip some information out of a block of HTML:

for ($i = 0; $i <= $tot_tblocks-1; $i++) {
// Load the HTML blocks...
$dom = new DOMDocument();
$dom->loadHTML($tblock[$i]);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[@class="desc"]/h2[@class="name"]');
        // Get the part I want...
foreach ($tags as $tag) {
	$tname[$i] = trim($tag->nodeValue);
	echo $tname[$i]."<br>";
}
}

Two questions:

1 - There is actually ever only one item ("name"), can I access it without the foreach loop? $tname[$i] = trim($tags->nodeValue); doesn't work.

2 - This code extracts content between tags of certain class names. But I would also like to extract the values of certain attributes of some of those tags, such as - perhaps - the value of the href in a <a> tag. The tag may not have unique class name, but I could still get an array from all the <a> tags in the source block? But I don't know how, and haven't been successful in deciphering the documentation... Any ideas?

dcro2 · April 12, 2011

I guess I'll take a crack at #1

if($tags->length > 0) {
  $tag = $tags->item(0);
  $tname[$i] = trim($tag->nodeValue);
}

requinix · April 12, 2011

1. Apparently xpath() always returns an array. You can use current to get the first one, like

$tag = current($xpath->query(...));

2. You sure that's what it does? Looks to me like it gets the text inside every div.desc>h2.name... Or does that stuff happen outside the code you posted?

What does the HTML look like? A solution is (probably) to use more XPath queries, unless you know the hierarchy of the HTML and where the A nodes fall inside.

Arty Ziff · April 12, 2011

...You sure that's what it does? Looks to me like it gets the text inside every div.desc>h2.name...

Yes, that's exactly what it does (of which there is only one occurrence in the HTML block being parsed...)

What does the HTML look like? A solution is (probably) to use more XPath queries, unless you know the hierarchy of the HTML and where the A nodes fall inside.

The hierarchy is known, but it could change. But the tags (for the most part) have unique class names.

Dcr0 - Works great. Thanks!

Sign In

Working with php's DOM functions

Recommended Posts

Arty Ziff

Link to comment

Share on other sites

dcro2

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

Arty Ziff

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information