Help!php Posted March 16, 2012 Share Posted March 16, 2012 I have a paragrpah which has tags that needs to be stripped off. so the paragraph looks like <div id="ctl00_placeholderMain_pnlInTheBox" class="tabitem"> <p> HP LaserJet 9050 printer<br/> Power cord<br/> Parallel cable<br/> HP LaserJet Q8543X Smart print cartridge<br/> Printer documentation<br/> Printer software CD<br/> Control panel overlay<br/> Face-up output bin<br/> Two 500-sheet input tray<br/> 100 Sheet Multipurpose Tray<br/> HP JetDirect Fast</p> </div> I want it to look like HP LaserJet 9050 printer Power cord Parallel cable HP LaserJet Q8543X Smart print cartridge Printer documentation Printer software CD Control panel overlay Face-up output bin Two 500-sheet input tray 100 Sheet Multipurpose Tray HP JetDirect Fast How would I go on about doing this.. currently i use $inbox = $html->find( "#ctl00_placeholderMain_pnlInTheBox" ); if ( isset( $inbox[ 0 ] ) ) { $box =( $inbox[0] ); $box = strpos($box, ';') !== FALSE ? substr( $box, strpos( $box, ";" ) + 1 ) : $box; } else { $box = "0"; } Quote Link to comment Share on other sites More sharing options...
floridaflatlander Posted March 16, 2012 Share Posted March 16, 2012 http://php.net/manual/en/function.strip-tags.php $input = strip_tags($input); or echo strip_tags($input); or to allow a tag or in your case $input = strip_tags($input, <br/ >); $input = strip_tags($input, <h1>); ?? I use <i> & <b> but I don't know how safe this is. Quote Link to comment Share on other sites More sharing options...
scootstah Posted March 16, 2012 Share Posted March 16, 2012 Using strip_tags with the second parameter opens up XSS attacks. So if you need to keep certain HTML elements use something else, like HTML Purifier. Quote Link to comment Share on other sites More sharing options...
Psycho Posted March 16, 2012 Share Posted March 16, 2012 His output doesn't show any HTML, not even BR tags. It looks like BR tags should be replaced with a line break. So, I would use a preg_replace() to change any BR tags to line breaks then use strip_tags() to remove any and all remaining tags. Note: I'd use preg_replace() instead of str_replace() to cover all the variations of BR tags. Quote Link to comment Share on other sites More sharing options...
Psycho Posted March 16, 2012 Share Posted March 16, 2012 Based upon the OPs description and example input/output data this should work function removeHTML($inputStr) { $outputStr = preg_replace('#<br>|<br/>#i', "\n", $inputStr); $outputStr = strip_tags($outputStr); return $outputStr; } Quote Link to comment Share on other sites More sharing options...
scootstah Posted March 16, 2012 Share Posted March 16, 2012 Based upon the OPs description and example input/output data this should work function removeHTML($inputStr) { $outputStr = preg_replace('#<br>|<br/>#i', "\n", $inputStr); $outputStr = strip_tags($outputStr); return $outputStr; } A little more robust pattern: $outputStr = preg_replace('#<br[\s\/]*>#i', "\n", $inputStr); His output doesn't show any HTML, not even BR tags. It looks like BR tags should be replaced with a line break. I'm not sure I agree with that assessment, because you would only see the BR tags if he posted the source. In any case, if he wanted BR instead of a newline he can just run nl2br after your function. That way, BRs are preserved but all other HTML is removed. Quote Link to comment Share on other sites More sharing options...
Psycho Posted March 16, 2012 Share Posted March 16, 2012 His output doesn't show any HTML, not even BR tags. It looks like BR tags should be replaced with a line break. I'm not sure I agree with that assessment, because you would only see the BR tags if he posted the source. Well, his input data explicitly showed the BR (and other) tags and his required output explicitly excluded all of the tags. So, I took that to mean he wanted the HTML markup removed and logical line breaks replacing the HTML linebreak tags.Of course he did state "I want it to look like" and that could be construed to only mean the "displayed" output. But, in that case the original content was fine to begin with. In any case, the fact that we interpreted the requirement differently means the request was not clear. Quote Link to comment Share on other sites More sharing options...
Help!php Posted March 16, 2012 Author Share Posted March 16, 2012 Thank you so much for everyone answer. I am a her. ... I apologies for my confusion. Without the strip tag.. it shows <div id="ctl00_placeholderMain_pnlInTheBox" class="tabitem"> <p> HP LaserJet 9050 printer<br/> Power cord<br/> Parallel cable<br/> HP LaserJet Q8543X Smart print cartridge<br/> Printer documentation<br/> Printer software CD<br/> Control panel overlay<br/> Face-up output bin<br/> Two 500-sheet input tray<br/> 100 Sheet Multipurpose Tray<br/> HP JetDirect Fast</p> </div> All i wanted to do is... get rid of ]<div id="ctl00_placeholderMain_pnlInTheBox" class="tabitem"> <p> and </p> </div>[/ Thats all. Quote Link to comment Share on other sites More sharing options...
scootstah Posted March 16, 2012 Share Posted March 16, 2012 Then something like this should do that (modifying Psycho's function): function removeHTML($inputStr) { $outputStr = preg_replace('#<br[\s\/]*>#i', "\n", $inputStr); $outputStr = strip_tags($outputStr); $outputStr = nl2br($outputStr); return $outputStr; } This will: 1. Convert all variations of <br> to newline characters, 2. Remove all HTML tags, and 3. Convert newline characters back to <br>. So you will remove all HTML while preserving line breaks. Quote Link to comment Share on other sites More sharing options...
floridaflatlander Posted March 16, 2012 Share Posted March 16, 2012 Using strip_tags with the second parameter opens up XSS attacks. ... I can see how strip_tags($input, <a>); could be used for a XSS attack but how could strip_tags($input, <i>); be used for a XSS attack? Quote Link to comment Share on other sites More sharing options...
scootstah Posted March 16, 2012 Share Posted March 16, 2012 This function does not modify any attributes on the tags that you allow using allowable_tags' date=' including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users. [/quote'] It doesn't change the attributes of the tags that it keeps. So something like <i onmouseover="leetHaxorsFunction();"> will still exist. Quote Link to comment Share on other sites More sharing options...
floridaflatlander Posted March 16, 2012 Share Posted March 16, 2012 thanks Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.