Jump to content

Working with php's DOM functions


Arty Ziff

Recommended Posts

I'm using php's DOM functions to strip some information out of a block of HTML:

for ($i = 0; $i <= $tot_tblocks-1; $i++) {
// Load the HTML blocks...
$dom = new DOMDocument();
$dom->loadHTML($tblock[$i]);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[@class="desc"]/h2[@class="name"]');
        // Get the part I want...
foreach ($tags as $tag) {
	$tname[$i] = trim($tag->nodeValue);
	echo $tname[$i]."<br>";
}
}

Two questions:

 

1 - There is actually ever only one item ("name"), can I access it without the foreach loop? $tname[$i] = trim($tags->nodeValue); doesn't work.

 

2 - This code extracts content between tags of certain class names. But I would also like to extract the values of certain attributes of some of those tags, such as - perhaps - the value of the href in a <a> tag. The tag may not have unique class name, but I could still get an array from all the <a> tags in the source block? But I don't know how, and haven't been successful in deciphering the documentation... Any ideas?

Link to comment
https://forums.phpfreaks.com/topic/233434-working-with-phps-dom-functions/
Share on other sites

1. Apparently xpath() always returns an array. You can use current to get the first one, like

$tag = current($xpath->query(...));

 

2. You sure that's what it does? Looks to me like it gets the text inside every div.desc>h2.name... Or does that stuff happen outside the code you posted?

What does the HTML look like? A solution is (probably) to use more XPath queries, unless you know the hierarchy of the HTML and where the A nodes fall inside.

...You sure that's what it does? Looks to me like it gets the text inside every div.desc>h2.name...
Yes, that's exactly what it does (of which there is only one occurrence in the HTML block being parsed...)

What does the HTML look like? A solution is (probably) to use more XPath queries, unless you know the hierarchy of the HTML and where the A nodes fall inside.

The hierarchy is known, but it could change. But the tags (for the most part) have unique class names.

 

Dcr0 - Works great. Thanks!

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.