qwikaddotcom Posted May 24, 2013 Share Posted May 24, 2013 Hi! Let's say somebody posts an ad that has html tables in them, and the tables have extra closing tags </table> in them. Example: <table border="0"> <tr> </td> Some text.... </td> <td> Some text.... </td> </tr> </table> </table> </table> Is there a way to remove extra closing tags (just extra </table> tags) from any html table? Thank you for any input. Quote Link to comment Share on other sites More sharing options...
ginerjm Posted May 24, 2013 Share Posted May 24, 2013 I think the best way is to get the poster to stop doing it! That said - I think you have to parse the text of the page and do your own thing to count start tags and end tags and skip the end tags when they outnumber the starts. Quote Link to comment Share on other sites More sharing options...
qwikaddotcom Posted May 24, 2013 Author Share Posted May 24, 2013 I think the best way is to get the poster to stop doing it! That said - I think you have to parse the text of the page and do your own thing to count start tags and end tags and skip the end tags when they outnumber the starts. How would you accomplish something like that with let's say preg_match or regex? Quote Link to comment Share on other sites More sharing options...
Jessica Posted May 24, 2013 Share Posted May 24, 2013 You wouldn't, you would use a Dom Parser. I like SimpleDOM Quote Link to comment Share on other sites More sharing options...
qwikaddotcom Posted May 24, 2013 Author Share Posted May 24, 2013 You wouldn't, you would use a Dom Parser. I like SimpleDOM I am not familiar with SimpleDOM. I found something about it on here: http://simplehtmldom.sourceforge.net/ it will take me a lot of time to try to figure something like removing extra </table> tags using it. Can you suggest a solution? The idea is if there's an unwanted closing table tag (whether 1 or many) they should be stripped. Any help will be appreciated. Quote Link to comment Share on other sites More sharing options...
Jessica Posted May 24, 2013 Share Posted May 24, 2013 Yes, I got the idea. It's up to you to write the code. If you want a quick solution done for you either in regex or a dom parser, you'll need to post in Freelancing. Otherwise, pick which way you want to go and make an effort, and we can help then Quote Link to comment Share on other sites More sharing options...
kicken Posted May 24, 2013 Share Posted May 24, 2013 You could try Tidy, it may be able to clean it up for you. Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted May 24, 2013 Share Posted May 24, 2013 Just treat the table content as a simple string, trim the content and remove only this duplicates that you want. Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted May 24, 2013 Share Posted May 24, 2013 (edited) If your problem is only "</table>" you can remove all empty spaces between "><" tags then just to remove all "</table" and add just one. Take a look at this, not very elegant solution, b/s don't have a much time but it should work. <?php $str = '<table border="0"> <tr> </td> Some text.... </td> <td> Some text.... </td> </tr> </table> </table> </table> '; $tbl = preg_replace('~>(\s+)?<~', '><', $str); $html = implode('',array_unique(explode('</table>', $tbl))); echo $html.'</table>'; Results: <table border="0"><tr></td>Some text....</td><td>Some text....</td></tr></table> Edited May 24, 2013 by jazzman1 Quote Link to comment Share on other sites More sharing options...
qwikaddotcom Posted May 24, 2013 Author Share Posted May 24, 2013 Just treat the table content as a simple string, trim the content and remove only this duplicates that you want. It's a great recommendation, but can you show me how it can be done with either preg_match, regex or SimpleDOM? All it has to do is clean all extra </table> tags (closing table tags). Everything else can stay. Thank you. Quote Link to comment Share on other sites More sharing options...
qwikaddotcom Posted May 24, 2013 Author Share Posted May 24, 2013 I guess you posted it just a second before I did. LOL. Quote Link to comment Share on other sites More sharing options...
qwikaddotcom Posted May 24, 2013 Author Share Posted May 24, 2013 But this will work just for one particular example. I need something more universal. I need a preg_match or something that will pretty much strip all extra </table> tags in all kinds of html table structures... Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted May 24, 2013 Share Posted May 24, 2013 For more universal solution, there is a "Freelance Section" to the forum or......I highly recommend you to start learning RegEX, I am a big their fan http://www.regular-expressions.info/ Quote Link to comment Share on other sites More sharing options...
qwikaddotcom Posted May 26, 2013 Author Share Posted May 26, 2013 For more universal solution, there is a "Freelance Section" to the forum or......I highly recommend you to start learning RegEX, I am a big their fan http://www.regular-expressions.info/ I know I am probably getting all of you annoyed with my posts about the same thing, but I need something different. Actually, what I need (as I have figured it out now) is something way simpler than what I thought I needed: If there is already a closing </table> tag at the end of a table, any other closing </table> tags after that last closing tag should be stripped. In other words, it doesn't matter what happens inside the table, what matters is after the table is closed and there's no new opening <table> tag, any extra closing </table> tags must be stripped. I think this is doable. AND... that will eliminate 95% of the wrong tables (from what I've seen happening in the posts). The difference between what I need now and what I needed before is that there will be no need for parsing. The only "parsing" that will be involved will go like this "ok the table has a closing tag and there are no openings tags, but there are extra closing tags...... strip them!" Can you suggest how this can be done with preg_match or something similar? I'd really appreciate it. Preg_match, str_replace or preg_replace work best for me, because I can apply them directly to the markdown. Thank you a lot! Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted May 26, 2013 Share Posted May 26, 2013 Where is the problem to use my script above? Quote Link to comment Share on other sites More sharing options...
Jessica Posted May 26, 2013 Share Posted May 26, 2013 That is the exact same question you already posted. Quote Link to comment Share on other sites More sharing options...
qwikaddotcom Posted May 26, 2013 Author Share Posted May 26, 2013 That is the exact same question you already posted. The difference is (at least I thought there was a difference), before, as I thought it was mentioned, it needed to be parsed where's now it probably can be accomplished with a one liner. For example (although it's a different solution): $text = preg_replace('/<([^<>]+)>/e', '"<" .str_replace(""", \'"\', "$1").">"', $text); or do I still have to use the solution offered by jazzman1 where I have to use the table itself within the script? Quote Link to comment Share on other sites More sharing options...
qwikaddotcom Posted May 26, 2013 Author Share Posted May 26, 2013 (edited) Also, without the table in the script, it will strip all unwanted </table> tags, not just a definitive number of them. Edited May 26, 2013 by qwikaddotcom Quote Link to comment Share on other sites More sharing options...
qwikaddotcom Posted May 26, 2013 Author Share Posted May 26, 2013 I guess I've taken the easiest way out for now. I've tried different lines and ended up with this one. It does what I want... for now: $text = preg_replace( '/(s*<\/table\s*\/?>\s*)+/', "</table>", $text); Thanks everyone for your input! Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted May 26, 2013 Share Posted May 26, 2013 No, it's not corect You should make a "\s" as optional! Quote Link to comment Share on other sites More sharing options...
Monkuar Posted May 26, 2013 Share Posted May 26, 2013 use htmlspecialchars to stop that crap dont let users EVER allow to use html Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted May 26, 2013 Share Posted May 26, 2013 (edited) The htmlspecialchars function just predefined special html characters to their entites? Can you give us an example what do you mean? Edited May 26, 2013 by jazzman1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.