Jump to content

Recommended Posts

How many of you needed to clean up those messy MS Word files in order to integrate them into valid W3C pages, or just integrate them in the overall design ?

I’ve looked for a good HTML Cleaner and did’t find a good free one.

 

Meanwhile, I’ve developed my own HTML Cleaner class in PHP, because I needed to clean up tons of word generated code in that time.

 

I’ve combined the strong HTML Tidy library with my own regular expression-based cleaning algorithms. I wanted a simple method to strip all unnecesarry tags and styles yet to keep it W3C standard compliant.

 

Synthax checking is beeing done only when using Tidy.

Note that this tool is designed to strip/clean useless tags and attributes back to HTML basics and optimize code, not sanitize (like HTMLPurifier).

 

Without the tidy PHP extension, the class can:

- remove styles, attributes

- strip useless tags

- fill empty table cells with non-breaking spaces

- optimize code (merge inline tags, strip empty inline tags, trim excess new lines)

- drop empty paragraphs

- compress (trim space and new-line breaks).

 

In conjunction with tidy, the class can apply all tidy actions (clean-up, fix errors, convert to XHTML, etc) and then optionally perform all actions of the class (remove styles, compress, etc).

 

Currently the following cleaning method is implemented: tag whitelist/attribute blacklist

 

More info: http://luci.criosweb.ro/blog/2007/08/04/html-cleaner/

Download latest version: http://luci.criosweb.ro/scripts/HTMLCleaner.rar

Demo (no tidy support): http://luci.criosweb.ro/scripts/HTMLCleaner/

 

PS to moderators: Today my account got deleted. My message from yesterday is gone. Could you tell me why?

Link to comment
https://forums.phpfreaks.com/topic/94220-test-my-html-cleaner-class/
Share on other sites

PS to moderators: Today my account got deleted. My message from yesterday is gone. Could you tell me why?

The forum got hacked and it was restored from a backup tape. Unfortunately, this process caused us to lose a number of new posts and registered users.

 

Ken

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.