Jump to content

Extract data from html table


Mix1988

Recommended Posts

Hi, how could i extract table data from html?

 

I tried simpleHTMLdom parses with no luck, then i found out that easyer method but its not working for me:

 

<?php
$data = file_get_contents('demo.htm');

$dom = new domDocument;

@$dom->loadHTML($data);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');

$rows = $tables->item(1)->getElementsByTagName('tr');

foreach ($rows as $row) {
        $cols = $row->getElementsByTagName('td');
        //echo $cols[2];
print_r($cols);
}

?>

 

I get DOMNodeList Object ( [length] => 0 ) like theres nothing.
 
Html table is like this:
<table cellpadding="0px" cellspacing="0px" style="table-layout:fixed" ;="">
<tbody><tr>
<td width="20" style="min-width:20px;max-width:20px;"></td>
<td width="100" style="min-width:100px;max-width:100px;"></td>
<td width="150" style="min-width:150px;max-width:150px;"></td>
<td width="400" style="min-width:400px;max-width:400px;"></td>
<td width="200" style="min-width:200px;max-width:200px;"></td>
</tr>
<tr>
<td rowspan="5"></td>
<td rowspan="5" valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;"><b>Aeg</b><br>15.12.2010</td>
<td rowspan="5" valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;"><b>Koht</b><br>Harjumaa</td>
<td valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;"><b>Sõiduk:</b> BMW 525TDS, 1997</td>
<td valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;"><b>Vastutuse ulatus:</b> 0%</td>
</tr>
<tr>
<td colspan="2" valign="top" ,="" style="padding-left:3px;padding-bottom:4px;"><b>Makstud sõidukikahju hüvitis:</b> kuni 500 eurot</td>
</tr>
<tr>
<td valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;"><b>Sõiduk:</b> OPEL ASTRA STATION WAGON, 2006</td>
<td valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;"><b>Vastutuse ulatus:</b> 100%</td>
</tr>
<tr>
<td colspan="2" valign="top" ,="" style="padding-left:3px;padding-bottom:4px;"><b>Makstud sõidukikahju hüvitis:</b> sõidukikahju ei hüvitatud</td>
</tr>
<tr>
<td colspan="2" valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;"><b>Käsitlev kindlustusandja:</b> QBE Insurance (Europe) Limited Eesti filiaal</td>
</tr>

<tr>
<td rowspan="5"></td>
<td rowspan="5" valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;background-color:#F5F5D1;"><b>Aeg</b><br>28.08.2010</td>
<td rowspan="5" valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;background-color:#F5F5D1;"><b>Koht</b><br>Tartu, Tartumaa</td>
<td valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;background-color:#F5F5D1;"><b>Sõiduk:</b> AUDI A4, 1996</td>
<td valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;background-color:#F5F5D1;"><b>Vastutuse ulatus:</b> 0%</td>
</tr>
<tr>
<td colspan="2" valign="top" ,="" style="padding-left:3px;padding-bottom:4px;background-color:#F5F5D1;"><b>Makstud sõidukikahju hüvitis:</b> 500 kuni 2000 eurot</td>
</tr>
<tr>
<td valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;background-color:#F5F5D1;"><b>Sõiduk:</b> BMW 525TDS, 1997</td>
<td valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;background-color:#F5F5D1;"><b>Vastutuse ulatus:</b> 100%</td>
</tr>
<tr>
<td colspan="2" valign="top" ,="" style="padding-left:3px;padding-bottom:4px;background-color:#F5F5D1;"><b>Makstud sõidukikahju hüvitis:</b> sõidukikahju ei hüvitatud</td>
</tr>
<tr>
<td colspan="2" valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;background-color:#F5F5D1;"><b>Käsitlev kindlustusandja:</b> If P&C Insurance AS</td>
</tr>

<tr>
<td></td>
<td colspan="4" valign="top" ,="" style="padding-left:3px;padding-top:4px;padding-bottom:4px;border-top-style:solid;border-width:1px;border-color=#4F4F4F;"> </td>
</tr>
</tbody></table>

 

 

How could i parse this table best way?

 

Link to post
Share on other sites

Seems the domDocument() doesn't like the way the table is formed.  To be honest, I don't think anything I have ever fed it, was acceptable the without tweaking.

 

Here is your first error:

 

DomDocument cannot read the file correctly.  It is throwing errors on every line that has ;="" in the attributes.  This is because it cannot apply the attribute name.

 

If you fix that, it still will not work, and is showing an empty nodeList. You can use

echo '<pre>' . print_r($dom,true) . '</pre>'; 

to see what the object contains.

Link to post
Share on other sites

Seems the domDocument() doesn't like the way the table is formed.  To be honest, I don't think anything I have ever fed it, was acceptable the without tweaking.

 

Here is your first error:

 

DomDocument cannot read the file correctly.  It is throwing errors on every line that has ;="" in the attributes.  This is because it cannot apply the attribute name.

 

If you fix that, it still will not work, and is showing an empty nodeList. You can use

echo '<pre>' . print_r($dom,true) . '</pre>'; 

to see what the object contains.

 

Ok i see, so this isnt a good method afterall for geting data from html table, what should i use? As far as googleing goes everybody seems to like SimpleHtlmDom parser, but i had 0 sucess with it...

Link to post
Share on other sites

I built my own DOM class a while ago: http://tomsfreelance.com/DOMe/DOMe.phps

 

I tested it on your table and it worked. I can't guarantee it will always work. (Requires valid HTML.) Breaking it down will be up to you though.

 

Here's an example:

 

<?php
    require_once("DOMe.php");
    
    $dom = new DOMe("div");
    $dom->importHTML(file_get_contents("file.html"));
    
    echo $dom->generate();
    
    echo "<pre>" . print_r($dom, true) . "</pre>";
Link to post
Share on other sites

You can use simpleHtmlDom.  The syntax would be:

 

 
<?php
include('path/to/simple_html_dom.php');
$dom = file_get_html('demo.htm');
 
$table = $dom->find('table',0);
 
$rows = $table->children(0)->children();
 
foreach($rows as $row) {
 foreach($row->children() as $column) {
  if(!empty($column->innertext)) {
   echo $column->innertext . '<br />' . PHP_EOL;
  }
 }
}
 
?>
Link to post
Share on other sites

I added a function "getElementsByTagName" so you can extract data easier. Here is how you might do it:

 

<?php
    require_once("DOMe.php");
    
    $dom = new DOMe("div");
    $dom->importHTML(file_get_contents("file.html"));
    
    echo $dom->generate();
    
    $rows = $dom->getElementsByTagName("tr");
    
    $data = array();
    foreach ($rows as $row) {
        $cells = $row->getElementsByTagName("td");
        $cellData = array();
        foreach ($cells as $cell) {
            $cellData[] = $cell->generate();
        }
        $data[] = $cellData;
    }
    
    echo "<pre>" . print_r($data, true) . "</pre>";

 

Output / example is at http://tomsfreelance.com/DOMe/DOM_Import.php

Make sure you get the updated code at http://tomsfreelance.com/DOMe/DOMe.phps

Link to post
Share on other sites

I added a function "getElementsByTagName" so you can extract data easier. Here is how you might do it:

 

<?php
    require_once("DOMe.php");
    
    $dom = new DOMe("div");
    $dom->importHTML(file_get_contents("file.html"));
    
    echo $dom->generate();
    
    $rows = $dom->getElementsByTagName("tr");
    
    $data = array();
    foreach ($rows as $row) {
        $cells = $row->getElementsByTagName("td");
        $cellData = array();
        foreach ($cells as $cell) {
            $cellData[] = $cell->generate();
        }
        $data[] = $cellData;
    }
    
    echo "<pre>" . print_r($data, true) . "</pre>";

 

Output / example is at http://tomsfreelance.com/DOMe/DOM_Import.php

Make sure you get the updated code at http://tomsfreelance.com/DOMe/DOMe.phps

This is really awesome and suits me best, thank you very much!!!

Link to post
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.