groupbrand Posted May 18, 2010 Share Posted May 18, 2010 Hi, I am trying to remove specific tag attributes from my HTML whilst leaving some intact. Really I need a regular expression that matches a list of 'accepted' tag attributes (e.g. href) and removes all others (e.g. style, class etc.) as in the following example: <a href="http://mysite.com" target="_blank" class="myclass"> would become <a href="http://mysite.com" target="_blank"> However, i won't just be matching <a> tags but all tags including <p>, <ul>, <li> etc. Hope someone can help! Cheers. Quote Link to comment Share on other sites More sharing options...
foxsoup Posted May 20, 2010 Share Posted May 20, 2010 Hi there. Give this a shot: // Some sample data to work with $data = '<p class="someclass">This is a <a href="http://www.google.com" target="_blank" style="color:#cc0000">hyperlink</a> in a paragraph!</p>'; // Array of attributes to keep, all others will be removed $keep = array('href', 'target'); // Get an array of all the attributes and their values in the data string preg_match_all('/[a-z]+=".+"/iU', $data, $attributes); // Loop through the attribute pairs, match them against the keep array and remove // them from $data if they don't exist in the array foreach ($attributes[0] as $attribute) { $attributeName = stristr($attribute, '=', true); if (!in_array($attributeName, $keep)) { $data = str_replace(' ' . $attribute, '', $data); } } That should output $data as something like: <p>This is a <a href="http://www.google.com" target="_blank">hyperlink</a> in a paragraph!</p> Hope this helps! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.