weep Posted December 3, 2012 Share Posted December 3, 2012 Hey guys, Can't seem to wrap my head around this. This is what I have: $husdjur = new DOMDocument(); @$husdjur->loadHTML("test.html"); $xpath = new DOMXPath($husdjur); $tableRows = $xpath->query('/html/body/table/tbody/tr[1]/td[1]'); print_r($tableRows); And this is what I get: DOMNodeList Object ( ) Here is a sample of test.html (in this case, I am going after the "5166" entry, this file is massive): <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <!-- saved from url=(0077)https://xxxxxxxxxxx.net/api/excel/usagequantities?period=300d&format=html --> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <style type="text/css">TABLE.responsedata { font-family: Calibri, Arial, monaco, monospace; font-size: 11pt } TABLE.responsedata,TABLE.responsedata TD { border: *snip*</style> </head> <body> <table class="responsedata"> <thead> <tr> <th>Ärendenr</th> <th>Status</th> <th>Ärende skapat datum</th> <th>Skapad av</th> <th>Ändrad</th> <th>Ändrad av</th> And so on, 50 something more... </tr> </thead> <tbody> <tr> <td>5166</td> <td>Avslutad</td> <td style="mso-number-format:'yyyy-mm-dd hh:mm';">2012-10-08 10:27</td> <td>Name1</td> <td style="mso-number-format:'yyyy-mm-dd hh:mm';">2012-10-08 10:27</td> <td>Name2</td> <td>K8 norr städ</td> And so on, 50 something more... Any help much appreciated, cheers! Quote Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/ Share on other sites More sharing options...
Maq Posted December 3, 2012 Share Posted December 3, 2012 First, try closing the Quote Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397197 Share on other sites More sharing options...
Maq Posted December 3, 2012 Share Posted December 3, 2012 (edited) After looking at your code I noticed a few things: 1) You should be using the method loadHTMLFile() not loadHTML(). The former method loads HTML from a FILE, the method you were using thought "test.html" was literally the HTML. 2) Turn on error reporting when you are debugging. 3) You should be declaring your namespace, in this case it's xmlns. Try: <?php // Report all PHP errors error_reporting(E_ALL); error_reporting(-1); $husdjur = new DOMDocument(); $husdjur->loadHTMLFile("test.html"); $xpath = new DOMXPath($husdjur); $xpath->registerNamespace("xmlns", "http://www.w3.org/1999/xhtml"); $tableRows = $xpath->query('/html/body/table/tbody/tr[1]/td[1]'); foreach($tableRows as $result) { echo $result->nodeValue; echo "\n"; } ?> Edited December 3, 2012 by Maq Quote Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397213 Share on other sites More sharing options...
requinix Posted December 3, 2012 Share Posted December 3, 2012 (edited) I think you have the wrong indexes on the tr and td too. Starts counting at zero. Also, your expression isn't doing what you think it's doing. [X] is not an offset, it's a condition. Try this more correct and more powerful version which goes directly to the cell you want without the guesswork of where it is: //table[@class='responsedata']//td[text()='5166']/following-sibling::td[position()=1] [edit] Also, the doesn't need to be closed. The parser is smart enough to know that it's automatically closed. Edited December 3, 2012 by requinix Quote Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397250 Share on other sites More sharing options...
Maq Posted December 3, 2012 Share Posted December 3, 2012 (edited) No, XPath indexing starts at 1. Also, your expression matches on Avslutad . Weep, if you tell us what exactly you're trying to match on, we can give you the best XPath solution. Also, the doesn't need to be closed. The parser is smart enough to know that it's automatically closed. Good to know. Edited December 3, 2012 by Maq Quote Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397308 Share on other sites More sharing options...
requinix Posted December 3, 2012 Share Posted December 3, 2012 (edited) No, XPath indexing starts at 1. ...Hmm. Okay. Wonder what I was thinking of. I even contradicted myself with the position()=1. Also, your expression matches on Avslutad. I misread the question and thought the problem was finding the username. "after the 5166". Edited December 3, 2012 by requinix Quote Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397318 Share on other sites More sharing options...
weep Posted December 4, 2012 Author Share Posted December 4, 2012 Sorry for the delay Sweet, plenty of awesome tips to try. I will poke around for a bit and return with a solution/result/more questions. No, XPath indexing starts at 1. Also, your expression matches on <td>Avslutad</td>. Weep, if you tell us what exactly you're trying to match on, we can give you the best XPath solution. I want to grab every cell within every <tr>, se picture: Quote Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397409 Share on other sites More sharing options...
salathe Posted December 4, 2012 Share Posted December 4, 2012 I want to grab every cell within every <tr> Then you likely want to get all of the rows, loop over them and access each row's individual collection of cells. The basic idea is something like: $tableRows = $xpath->query('/html/body/table/tbody/tr'); foreach ($tableRows as $row) { $cells = $xpath->query('td', $row); foreach ($cells as $cell) { echo $cell->getNodePath(); echo ' has value '; var_export($cell->nodeValue); echo "<br>\n"; } } Quote Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397415 Share on other sites More sharing options...
weep Posted December 4, 2012 Author Share Posted December 4, 2012 Thank you for all your help guys! Solution for this thread: // Report all PHP errors error_reporting(E_ALL); error_reporting(-1); $husdjur = new DOMDocument(); $husdjur->loadHTMLFile("test.html"); $xpath = new DOMXPath($husdjur); $xpath->registerNamespace("xmlns", "http://www.w3.org/1999/xhtml"); $tableRows = $xpath->query('/html/body/table/tbody/tr'); foreach ($tableRows as $row) { $cells = $xpath->query('td', $row); foreach ($cells as $cell) { echo $cell->getNodePath(); echo ' has value '; var_export($cell->nodeValue); echo "<br>\n"; } } Quote Link to comment https://forums.phpfreaks.com/topic/271536-parse-html-with-xpath/#findComment-1397425 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.