I was trying the php tidy extension/classes, so i tried a couple of sources of html with set missing tags and it worked well, replaced the missing tags, but one example i tried with two missing closing tags, a missing </title> closing and a missing </b> closing tag, it totally messed up the replacement of the missing closing </title> tag placing it under the body tag. Here is the source html i feed in,
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>
<body>
<b>hello
</body>
</html>
here is the result of php tidy,
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//w3c//dtd xhtml 1.0 strict//en"
"http://www.w3.org/tr/xhtml1/dtd/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>
<body>
<b>hello</b>
</body>
</title>
</head>
</html>
here is the php code i used for this,
<?php
include "vars.php";
$html = open_html_file_for_reading("val.html");
$config = array(
'indent' => true,
'output-xml' => true,
'input-xml' => true,
'wrap' => '1000');
// Tidy
$tidy = new tidy();
$tidy->parseString($html, $config, 'utf8');
$tidy->cleanRepair();
echo tidy_get_output($tidy);
?>
any help would be great, it might be something to do with mixing standards in html and xhtml , strict , translational and frameset, or maybe the <b> is depreciated and it is getting confused, but as i said so far it only seems to mess things up if the</title> is missing..... any ideas?
MAybe my config options
$config = array(
'indent' => true,
'output-xml' => true,
'input-xml' => true,
'wrap' => '1000');
just need to be tweaked??