weep Posted December 3, 2012 Share Posted December 3, 2012 Hey guys, Can't seem to wrap my head around this. This is what I have: $husdjur = new DOMDocument(); @$husdjur->loadHTML("test.html"); $xpath = new DOMXPath($husdjur); $tableRows = $xpath->query('/html/body/table/tbody/tr[1]/td[1]'); print_r($tableRows); And this is what I get: DOMNodeList Object ( ) Here is a sample of test.html (in this case, I am going after the "5166" entry, this file is massive): <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <!-- saved from url=(0077)https://xxxxxxxxxxx.net/api/excel/usagequantities?period=300d&format=html --> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <style type="text/css">TABLE.responsedata { font-family: Calibri, Arial, monaco, monospace; font-size: 11pt } TABLE.responsedata,TABLE.responsedata TD { border: *snip*</style> </head> <body> <table class="responsedata"> <thead> <tr> <th>Ärendenr</th> <th>Status</th> <th>Ärende skapat datum</th> <th>Skapad av</th> <th>Ändrad</th> <th>Ändrad av</th> And so on, 50 something more... </tr> </thead> <tbody> <tr> <td>5166</td> <td>Avslutad</td> <td style="mso-number-format:'yyyy-mm-dd hh:mm';">2012-10-08 10:27</td> <td>Name1</td> <td style="mso-number-format:'yyyy-mm-dd hh:mm';">2012-10-08 10:27</td> <td>Name2</td> <td>K8 norr städ</td> And so on, 50 something more... Any help much appreciated, cheers! Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/ Share on other sites More sharing options...
Maq Posted December 3, 2012 Share Posted December 3, 2012 First, try closing the Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397197 Share on other sites More sharing options...
Maq Posted December 3, 2012 Share Posted December 3, 2012 After looking at your code I noticed a few things: 1) You should be using the method loadHTMLFile() not loadHTML(). The former method loads HTML from a FILE, the method you were using thought "test.html" was literally the HTML. 2) Turn on error reporting when you are debugging. 3) You should be declaring your namespace, in this case it's xmlns. Try: <?php // Report all PHP errors error_reporting(E_ALL); error_reporting(-1); $husdjur = new DOMDocument(); $husdjur->loadHTMLFile("test.html"); $xpath = new DOMXPath($husdjur); $xpath->registerNamespace("xmlns", "http://www.w3.org/1999/xhtml"); $tableRows = $xpath->query('/html/body/table/tbody/tr[1]/td[1]'); foreach($tableRows as $result) { echo $result->nodeValue; echo "\n"; } ?> Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397213 Share on other sites More sharing options...
requinix Posted December 3, 2012 Share Posted December 3, 2012 I think you have the wrong indexes on the tr and td too. Starts counting at zero. Also, your expression isn't doing what you think it's doing. [X] is not an offset, it's a condition. Try this more correct and more powerful version which goes directly to the cell you want without the guesswork of where it is: //table[@class='responsedata']//td[text()='5166']/following-sibling::td[position()=1] [edit] Also, the doesn't need to be closed. The parser is smart enough to know that it's automatically closed. Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397250 Share on other sites More sharing options...
Maq Posted December 3, 2012 Share Posted December 3, 2012 No, XPath indexing starts at 1. Also, your expression matches on Avslutad . Weep, if you tell us what exactly you're trying to match on, we can give you the best XPath solution. Quote Also, the doesn't need to be closed. The parser is smart enough to know that it's automatically closed. Good to know. Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397308 Share on other sites More sharing options...
requinix Posted December 3, 2012 Share Posted December 3, 2012 On 12/3/2012 at 10:22 PM, Maq said: No, XPath indexing starts at 1. ...Hmm. Okay. Wonder what I was thinking of. I even contradicted myself with the position()=1. On 12/3/2012 at 10:22 PM, Maq said: Also, your expression matches on Avslutad. I misread the question and thought the problem was finding the username. "after the 5166". Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397318 Share on other sites More sharing options...
weep Posted December 4, 2012 Author Share Posted December 4, 2012 Sorry for the delay Sweet, plenty of awesome tips to try. I will poke around for a bit and return with a solution/result/more questions. On 12/3/2012 at 10:22 PM, Maq said: No, XPath indexing starts at 1. Also, your expression matches on <td>Avslutad</td>. Weep, if you tell us what exactly you're trying to match on, we can give you the best XPath solution. I want to grab every cell within every <tr>, se picture: Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397409 Share on other sites More sharing options...
salathe Posted December 4, 2012 Share Posted December 4, 2012 On 12/4/2012 at 7:25 AM, weep said: I want to grab every cell within every <tr> Then you likely want to get all of the rows, loop over them and access each row's individual collection of cells. The basic idea is something like: $tableRows = $xpath->query('/html/body/table/tbody/tr'); foreach ($tableRows as $row) { $cells = $xpath->query('td', $row); foreach ($cells as $cell) { echo $cell->getNodePath(); echo ' has value '; var_export($cell->nodeValue); echo "<br>\n"; } } Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397415 Share on other sites More sharing options...
weep Posted December 4, 2012 Author Share Posted December 4, 2012 Thank you for all your help guys! Solution for this thread: // Report all PHP errors error_reporting(E_ALL); error_reporting(-1); $husdjur = new DOMDocument(); $husdjur->loadHTMLFile("test.html"); $xpath = new DOMXPath($husdjur); $xpath->registerNamespace("xmlns", "http://www.w3.org/1999/xhtml"); $tableRows = $xpath->query('/html/body/table/tbody/tr'); foreach ($tableRows as $row) { $cells = $xpath->query('td', $row); foreach ($cells as $cell) { echo $cell->getNodePath(); echo ' has value '; var_export($cell->nodeValue); echo "<br>\n"; } } Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397425 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.