Jump to content

Stripping unwanted (extra) </table> tags


qwikaddotcom

Recommended Posts

Hi!

 

Let's say somebody posts an ad that has html tables in them, and the tables have extra closing tags </table> in them. Example:

 

 

 
<table border="0">
<tr>
</td>
 
Some text....
 
</td>
<td>
 
Some text....
 
</td>
</tr>
 
</table>
 
</table>
</table>
 

 

Is there a way to remove extra closing tags (just extra </table> tags) from any html table?

 

Thank you for any input.

Link to comment
Share on other sites

I think the best way is to get the poster to stop doing it!

 

That said - I think you have to parse the text of the page and do your own thing to count start tags and end tags and skip the end tags when they outnumber the starts.

Link to comment
Share on other sites

I think the best way is to get the poster to stop doing it!

 

That said - I think you have to parse the text of the page and do your own thing to count start tags and end tags and skip the end tags when they outnumber the starts.

 

How would you accomplish something like that with let's say preg_match or regex?

Link to comment
Share on other sites

You wouldn't, you would use a Dom Parser. I like SimpleDOM

 

I am not familiar with SimpleDOM. I found something about it on here: http://simplehtmldom.sourceforge.net/ it will take me a lot of time to try to figure something like removing extra </table> tags using it. Can you suggest a solution? The idea is if there's an unwanted closing table tag (whether 1 or many) they should be stripped. Any help will be appreciated.

Link to comment
Share on other sites

Yes, I got the idea. It's up to you to write the code.

 

If you want a quick solution done for you either in regex or a dom parser, you'll need to post in Freelancing. Otherwise, pick which way you want to go and make an effort, and we can help then

Link to comment
Share on other sites

If your problem is only "</table>" you can remove all empty spaces between "><" tags then just to remove all "</table" and add just one.

 

Take a look at this, not very elegant solution, b/s don't have a much time but it should work.

<?php

$str = '<table border="0">
<tr>

</td>
 
Some text....
 
</td>
<td>
 
Some text....
 
</td>

</tr>  

</table>

</table>

</table>
';

$tbl = preg_replace('~>(\s+)?<~', '><', $str);

$html = implode('',array_unique(explode('</table>', $tbl)));

echo $html.'</table>';

Results:

 

<table border="0"><tr></td>

Some text....

</td><td>

Some text....

</td></tr>
</table>

 

Edited by jazzman1
Link to comment
Share on other sites

Just treat the table content as a simple string, trim the content and remove only this duplicates that you want.

 

 

It's a great recommendation, but can you show me how it can be done with either preg_match, regex or SimpleDOM? All it has to do is clean all extra </table> tags (closing table tags). Everything else can stay. Thank you.

Link to comment
Share on other sites

For more universal solution, there is a "Freelance Section" to the forum or......I highly recommend you to start learning RegEX, I am a big their fan ;)

 

http://www.regular-expressions.info/

 

 

I know I am probably getting all of you annoyed with my posts about the same thing, but I need something different. Actually, what I need (as I have figured it out now) is something way simpler than what I thought I needed:
 
If there is already a closing </table> tag at the end of a table, any other closing </table> tags after that last closing tag should be stripped. 
 
In other words, it doesn't matter what happens inside the table, what matters is after the table is closed and there's no new opening <table> tag, any extra closing </table> tags must be stripped. I think this is doable. AND... that will eliminate 95% of the wrong tables (from what I've seen happening in the posts).
 
The difference between what I need now and what I needed before is that there will be no need for parsing. The only "parsing" that will be involved will go like this "ok the table has a closing tag and there are no openings tags, but there are extra closing tags...... strip them!" 
 
Can you suggest how this can be done with preg_match or something similar? I'd really appreciate it. Preg_match, str_replace or preg_replace work best for me, because I can apply them directly to the markdown.
 
Thank you a lot!
Link to comment
Share on other sites

That is the exact same question you already posted.

 

The difference is (at least I thought there was a difference), before, as I thought it was mentioned, it needed to be parsed where's now it probably can be accomplished with a one liner. For example (although it's a different solution):

 $text = preg_replace('/<([^<>]+)>/e', '"<" .str_replace(""", \'"\', "$1").">"', $text);

or do I still have to use the solution offered by jazzman1 where I have to use the table itself within the script?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.