So here's what I'm trying to do, and I haven't found any clear tutorials on how to properly navigate a DOMDocument object, at least not in the strict sense of PHP.
I'm building a web scraper, I've had it working for some time now using more traditional methods (a combination of string manipulation and clever regex). I've been told xpath can be much faster and more reliable for what I need. Sold.
Let's say I'm parsing a forum. This forum separates each reply in a post with a set of <li></li> with a class of "message"
<li class="message">
// Stuff here
</li>
<li class="message">
// Stuff here
</li>
So far so good. These list items contain all the formatting for each post, including user info and the message text. Each sitting in it's own div.
<li class="message">
<div class="user info">
User info here
</div>
<div class="message text">
Message text here
</div>
</li>
<li class="message">
<div class="user info">
User info here
</div>
<div class="message text">
Message text here
</div>
</li>
Still with me? Good.
With this bit of code I can select each message list item block and iterate over all the sub nodes inside.
$items = $xpath->query("//li[starts-with(@class, 'message')]");
for ($i = 0; $i < $items->length; $i++) {
echo $items->item($i)->nodeValue . "\n";
}
This produces a basic text dump of the entire forum. Close, but not what I need.
What I'm trying to do is as follows
Select all the class="message" list items [done]
Once those have been selected, run another $xpath->query to select the child nodes which contain the user info and message text
Step one is done, step two is what is confusing me. How can I run a new query based on the output from the first query?
Thanks guys