GeoffreyBernardo Posted October 13, 2010 Share Posted October 13, 2010 I want to remove empty paragraphs from an HTML document using simple_html_dom.php. I know how to do it using the DOMDocument class, but, because the HTML files I work with are prepared in MS Word, the DOMDocument's loadHTMLFile() function gives this exception "Namespaces are not defined". This is the code I use with the DOMDocument object for HTML files not prepared in MS Word: <?php /* Using the DOMDocument class */ /* Create a new DOMDocument object. */ $html = new DOMDocument("1.0", "UTF-8"); /* Load HTML code from an HTML file into the DOMDocument. */ $html->loadHTMLFile("HTML File With Empty Paragraphs.html"); /* Assign all the <p> elements into the $pars DOMNodeList object. */ $pars = $html->getElementsByTagName("p"); echo "The initial number of paragraphs is " . $pars->length . ".<br />"; /* The trim() function is used to remove leading and trailing spaces as well as * newline characters. */ for ($i = 0; $i < $pars->length; $i++){ if (trim($pars->item($i)->textContent) == ""){ $pars->item($i)->parentNode->removeChild($pars->item($i)); $i--; } } echo "The final number of paragraphs is " . $pars->length . ".<br />"; // Write the HTML code back into an HTML file. $html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html"); ?> This is the code I use with the simple_html_dom.php module for HTML files prepared in MS Word: <?php /* Using simple_html_dom.php */ include("simple_html_dom.php"); $html = file_get_html("HTML File With Empty Paragraphs.html"); $pars = $html->find("p"); for ($i = 0; $i < count($pars); $i++) { if (trim($pars[$i]->plaintext) == "") { unset($pars[$i]); $i--; } } $html->save("HTML File without Empty Paragraphs.html"); ?> It is almost the same, except that that the $pars variable is a DOMNodeList when using DOMDocument and an array when using simple_html_dom.php. But this code does not work. First it runs for two minutes and then reports these errors: "Undefined offset: 1" and "Trying to get property of nonobject" for this line: "if (trim($pars[$i]->plaintext == "")) {". Does anyone know how I can fix this? Thank you. I also asked on stackoverflow. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.