Jump to content

Parse HML table to PHP - AND strip out the microsoft stuff


techmonkey78

Recommended Posts

Hi,

 

I'm designing a website for an online radio station. One show a week they have a DJ who has his own top 40 show with HIS top 40 listed on his own site

 

At the moment, the only solution I have to getting the top 40 onto my clients site is an iframe (not good I know!)

However, what's worse is that this top 40 list has horrendous colours that completely break the theme of my clients site and worse still, is Excel generated HTML! The author of this list is not a client of mine, and so, I can't do anything to persuade him to change. I've attached a copy of the offending file so you can see what I'm working with, changed the extension from html to txt for the forum post.

Just for clarification, the excel generated HTML file is not hosted on my server (although I could set up a cron job to get it if required)

 

I've done some  digging and found a useful bit of code

 

However, it displays the table 4 or 5 times

In addition to this, is there a way to manipulate the width of the columns with this code?

 

<?php 
$oldSetting = libxml_use_internal_errors( true ); 
libxml_clear_errors(); 

$html = new DOMDocument(); 
$html->loadHtmlFile('Chart%20Table2.htm'); 

$xpath = new DOMXPath( $html ); 
$elements = $xpath->query( "//table" ); 

foreach ( $elements as $item ) {
  $newDom = new DOMDocument;
  $newDom->appendChild($newDom->importNode($item,true));

  $xpath = new DOMXPath( $newDom ); 

  foreach ($item->attributes as $attribute) { 

    for ($node = $item->firstChild; $node !== NULL; 
         $node = $node->nextSibling) {
      if (($attribute->nodeName =='valign') && ($attribute->nodeValue=='top'))
      {
        print($node->nodeValue); 
      }
      else
      {
        print("<br>".$node->nodeValue);
      }
    }
    print("<br>");
  } 
}

libxml_clear_errors(); 
libxml_use_internal_errors( $oldSetting );



?>

 

Basically, I just want the table,

I'm not bothered about the first column that contains some pictures as I think that's a separate table

 

Ideally, I'd like to apply my own formatting to the table too (padding etc) if that is possible (eg from line 897 <td class=xl28 x:num>1</td>) down to line 1433

 

The trouble is, there is a lot of custom widths and styles inbetween which I don't want

I'd like all the tr and td's without all the associated guff from the original export from Excel

 

Geez I wish there was an easy way to get this guy just to export as CSV but unfortunately that's not an option!

 

Many thanks in advance!

 

[attachment deleted by admin]

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.