Arty Ziff Posted April 12, 2011 Share Posted April 12, 2011 I'm using php's DOM functions to strip some information out of a block of HTML: for ($i = 0; $i <= $tot_tblocks-1; $i++) { // Load the HTML blocks... $dom = new DOMDocument(); $dom->loadHTML($tblock[$i]); $xpath = new DOMXPath($dom); $tags = $xpath->query('//div[@class="desc"]/h2[@class="name"]'); // Get the part I want... foreach ($tags as $tag) { $tname[$i] = trim($tag->nodeValue); echo $tname[$i]."<br>"; } } Two questions: 1 - There is actually ever only one item ("name"), can I access it without the foreach loop? $tname[$i] = trim($tags->nodeValue); doesn't work. 2 - This code extracts content between tags of certain class names. But I would also like to extract the values of certain attributes of some of those tags, such as - perhaps - the value of the href in a <a> tag. The tag may not have unique class name, but I could still get an array from all the <a> tags in the source block? But I don't know how, and haven't been successful in deciphering the documentation... Any ideas? Quote Link to comment Share on other sites More sharing options...
dcro2 Posted April 12, 2011 Share Posted April 12, 2011 I guess I'll take a crack at #1 if($tags->length > 0) { $tag = $tags->item(0); $tname[$i] = trim($tag->nodeValue); } Quote Link to comment Share on other sites More sharing options...
requinix Posted April 12, 2011 Share Posted April 12, 2011 1. Apparently xpath() always returns an array. You can use current to get the first one, like $tag = current($xpath->query(...)); 2. You sure that's what it does? Looks to me like it gets the text inside every div.desc>h2.name... Or does that stuff happen outside the code you posted? What does the HTML look like? A solution is (probably) to use more XPath queries, unless you know the hierarchy of the HTML and where the A nodes fall inside. Quote Link to comment Share on other sites More sharing options...
Arty Ziff Posted April 12, 2011 Author Share Posted April 12, 2011 ...You sure that's what it does? Looks to me like it gets the text inside every div.desc>h2.name...Yes, that's exactly what it does (of which there is only one occurrence in the HTML block being parsed...) What does the HTML look like? A solution is (probably) to use more XPath queries, unless you know the hierarchy of the HTML and where the A nodes fall inside. The hierarchy is known, but it could change. But the tags (for the most part) have unique class names. Dcr0 - Works great. Thanks! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.