Jump to content

How to clear all html tags except some from html file using regex?


dpacmittal

Recommended Posts

I have retrieved html contents using CURL. I successfully retrieved all the contents between the needed form tags.

 

It has some table tags (<tr><td><th>) and many other tags. I just want to retrieve all <input> tags, <select> tags, and <textarea> tags.

 

Whats the regex I should use to clear all unneeded tags?

example:

 

remove all tr tags:

 

$content = preg_replace('~</?tr[^>]*>~i','',$content);

 

note: that does not remove content between the tags.

Thanks.. will try it. Can't we do like just remove all tags except few. I know some regex but not so complex.

I know we can add ^ which means "NOT".

Can't we do something like that?

Thanks.. will try it. Can't we do like just remove all tags except few. I know some regex but not so complex.

I know we can add ^ which means "NOT".

Can't we do something like that?

Have you tried strip_tags? As second parameter it takes the allowable tags.

Anyways if you want do it with regex try this one (I've modified Crayon Violent regexp adding a negative lookahead assertion):

 

$content=preg_replace('/<\/?(?!input|textarea|select)[^>]*>/','',$content);

 

In some cases it could have problems (html code within html comments ...  casually I tried it against a page that had it).

 

 

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.