Jump to content

Replacing html tags using DOMDocument


CodeCaster

Recommended Posts

Hi

 

I'm trying to replace one html tag with another (taking into account attributes, nesting, ...) using PHP DOMDocument.

I currently have the following code:

 

function get_inner_html($element) 
{ 
    $innerHTML = ""; 
    $children = $element->childNodes; 
    foreach ($children as $child) 
    { 
        $tmp_dom = new DOMDocument(); 
        $tmp_dom->appendChild($tmp_dom->importNode($child, true)); 
        $innerHTML .= trim($tmp_dom->saveHTML()); 
    } 
    return $innerHTML; 
} 

function replace_tag($content = '')
{
//Initialize dom and xpath
$dom = new DOMDocument();
$dom->formatOutput = TRUE;
$dom->loadHTML($content);

$xpath = new DOMXpath($dom);

//Look for spans with the right attribute
$elements = $xpath->query('//span[@data-widget="abc"]');

//Iterate through them in reverse order (because replacements changes
//the contents of the list of elements)
$i = $elements->length - 1; 
while ($i > -1)
{
	$element = $elements->item($i);
	$inner = get_inner_html($element);

	//Create a new document for the replacement element
	$newdom = new DOMDocument();
	$newdom->formatOutput = true;

	$newdom->loadXML("<div class='abc'>" . $inner . "</div>");

	//ERROR HERE!! :
	$element->parentNode->replaceChild($newdom, $element); 
	$i--; 
}

return $dom->saveHTML();
}

 

It gives me an error on the line indicated by the "ERROR HERE" comment:

 

Fatal error: Uncaught exception 'DOMException' with message 'Wrong Document Error'

 

Seems likely that the $newdom element is not the right one to feed to the replaceChild method.  Can someone give me a clue on how to fix this?

 

Thanks!

Link to comment
Share on other sites

Do you want to replace the contents of a div with something else? Similar to the .html() function in jQuery?

But still keep the container tags (the parent): <div>?

 

No, I want to change the container tags as well.  So the following:

 

<span>
<span>
	<span data-widget='abc'>
		<b>some html code</b>
		<p>no way of telling how much nested html is in here</p>
	</span>
</span>
</span>

 

Will become:

 

<span>
<span>
	<div class='abc'>
		<b>some html code</b>
		<p>no way of telling how much nested html is in here</p>
	</div>
</span>
</span>

 

I have no idea where the original span will be, so I cannot use child element traversing, as it might be nested n levels deep in any type of parent tag.

Link to comment
Share on other sites

Ok, perhaps I should explain the entire situation.

 

I'm working on a wiki system that uses an "alternative" way of formatting text.  Normal wiki systems use something like:

 

'''This text is bold'''

 

While I would like to use something like

 

[bold: this text is bold]

 

and then do a preg_replace to turn this into

 

<span class='important'>this text is bold</span>

 

However, that gets me into trouble when I start nesting different tags which will result in html tags that have different names.  For example:

 

[list-item: This is a list item with [bold: bold text nested in it]]

 

Using preg-replace may cause issues here:

 

$content = preg_replace("@\[bold:(.*?)\]@si", "<span class='important'>$1</span>", $content);
$content = preg_replace("@\[list\-item:(.*?)\]@si", "<li'>$1</li>", $content);

 

In this case, we will be ok:

 

<li>This is a list item with <span class='important'>bold text nested in it</span></li>

 

However, because preg_replace looks for the first match of the closing bracket, we get in trouble if we swap the order in which tags are replaced:

 

$content = preg_replace("@\[list\-item:(.*?)\]@si", "<li'>$1</li>", $content);
$content = preg_replace("@\[bold:(.*?)\]@si", "<span class='important'>$1</span>", $content);

//results in:
<li>This is a list item with <span class='important'>bold text nested in it</li></span>

 

Notice the closing tags being swapped.

There is no way of nowing how tags will be nested, so preg_matching them in the right order doesn't seem like a solution.  In fact, the only situation where this is not a problem is when all tags have the same name (and therefore, more importantly, the same closing tag):

 

<span data-tag='li'>This is a list item with <span class='important'>bold text nested in it</span></span>

 

So the idea is to preg_replace everything to spans and then use DOMDocument to replace spans with the real html tags they should be (data-tag='li' becomes <li>, data-tag='ul' becomes <ul>, and so on).

 

So as you can see, preg_replace is not an option to replace the tags, and I have no way of knowing where the tags will be, which tags will have been used, how deep they are nested, with which children, ...

 

If you see a better way to ocvercome this issue, thant's ok with me :)

Link to comment
Share on other sites

Ok mate I thought I'd give it a try and after 5 hours I succeeded. And it wasn't a waste of time at all I learned alot about SimpleXML and DOMDocument.

 

The XML solution failed because it had no way of injecting nodes in an arbitrary place in the DOM tree.

DOMDocument on the other hand gives you much more control. With a custom function it works great.

 

The big problem is how do you know when the user wants to (albeit rarely) enter a bracket ] or he's ending a formatting string?

 

function transformNode(&$oldNode, $newName, $destroyAttributes) {
   $newNode = $oldNode->ownerDocument->createElement($newName);

   if (!$destroyAttributes) {
      foreach($oldNode->attributes as $attr)
         $newNode->appendChild($attr->cloneNode());
   }

   foreach($oldNode->childNodes as $child)
      $newNode->appendChild($child->cloneNode(true));

   $oldNode->parentNode->replaceChild($newNode, $oldNode);
}


$html = '[list-item: This is a list item with [list-item: a nested list item] and a [bold: bold text nested in it]]';

$html = str_ireplace('[list-item: ', '<span data-tag="li">', $html);
$html = str_ireplace('[bold: ', '<span class="important">', $html);

$html = str_replace(']', '</span>', $html);


$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXpath($dom);
$spans = $xpath->query('//span[@data-tag]');

$lastElement = $spans->length - 1;

for ($lastElement; $lastElement >= 0; $lastElement--)
   transformNode($spans->item($lastElement), $spans->item($lastElement)->getAttribute($spans->item($lastElement)->attributes->item(0)->nodeName), true);

echo $dom->saveHTML();

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.