Jump to content

How to stop DOMDocument destroying embeds?


robertandrews

Recommended Posts

I am making use of PHP's `DOMDocument within WordPress to carry out two DOM operations on the post content, by filtering the_content:

  • Wrap a certain element pattern in a div.
  • Add a class to a particular element.

But I have a problem with that:

DOMDocument is also destroying embedded WordPress content like tweets. eg. An embedded tweet is not rendered as an <iframe> as it should be, it is just rendered using <p>. (FYI, in WordPress, authors can paste in the URL of an oEmbed object like a tweet, YouTube video, SoundCloud track - whilst they are stored in the post database as just a plain URL, when the_content is output, they are rendered as <iframe>s or similar).

I'm new to DOMDocument. Is there any way I can stop it from destroying embedded elements?

My two pieces of code are below. They each use DOMDocument similarly.


/*
Wrap element in another element.
Contributed by @XzKto, https://stackoverflow.com/a/8428323/1375163
*/
function wrap_element($dom, $wrapped_element, $new_element) {
    // Initialise the new wrapper
    $wrapper = $dom->createElement($new_element);
    //Clone our created element
    $wrapper_clone = $wrapper->cloneNode();
    //Replace image with this wrapper div
    $wrapped_element->parentNode->replaceChild($wrapper_clone,$wrapped_element);
    //Append the element to wrapper div
    $wrapper_clone->appendChild($wrapped_element);
}

/*
Blockquote
*/
add_filter( 'the_content', 'bootstrap_blockquote' );
function bootstrap_blockquote( $content ) {
    // Load DOM of post content
    
    // $content = mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8');
    // $dom = new DOMDocument('1.0', 'utf-8');
    // libxml_use_internal_errors(true);
    // $dom->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

    $dom = new DOMDocument;
    libxml_use_internal_errors(true);
    $dom->loadHTML($content);
    libxml_clear_errors();

    foreach($dom->getElementsByTagName('blockquote') as $blockquote){
        // Add blockquote class
        // Class addition contributed by @Gillu13, https://stackoverflow.com/a/63088684/1375163
        $class_to_add = 'blockquote';
        $blockquote->setAttribute('class', $class_to_add);
        // Wrap blockquote in <figure>
        wrap_element($dom, $blockquote, 'figure');
    }

    $content = $dom->saveHTML();
    return $content;
}

..


/*
Original contributed by @jack,
https://stackoverflow.com/a/10683463/1375163
*/
add_filter( 'the_content', 'segment_post' );
function segment_post( $content ) {

    // Load post as document object module
    $dom = new DOMDocument;
    libxml_use_internal_errors(true);
    $dom->loadHTML($content);
    libxml_clear_errors();

    // Initialise variables
    $segments = array();
    $card = null;


    foreach ($dom->getElementsByTagName('h3') as $h3) {

        // 1. First, collect all nodes
        $card_nodes = array($h3);

        // iterate until another h3 or no more siblings
        for ($next = $h3->nextSibling; $next && $next->nodeName != 'h3'; $next = $next->nextSibling) {
            $card_nodes[] = $next;
        }

        // 2. Create <div> placeholder, with class attributes
        $card = $dom->createElement('section');
        $card->setAttribute('class', 'card p-4 mb-3');

        // replace the h3 with the new card
        $h3->parentNode->replaceChild($card, $h3);
        // and move all nodes into the newly created card
        foreach ($card_nodes as $node) {
            $card->appendChild($node);
        }
        // keep title of the original h3
        $segments[] = $h3->nodeValue;

        /*
        // 3. Also wrap with <section>
        // Initialise the new wrapper
        $wrapper = $dom->createElement('section');
        //Clone our created element
        $wrapper_clone = $wrapper->cloneNode();
        //Replace image with this wrapper div
        $card->parentNode->replaceChild($wrapper_clone,$card);
        //Append the element to wrapper div
        $wrapper_clone->appendChild($card);
        */

    }

    //  make sure we have segments (card is the last inserted card in the dom)
    /*
    if ($segments && $card) {
        $ul = $dom->createElement('ul');
        foreach ($segments as $title) {
            $li = $dom->createElement('li');

            $a = $dom->createElement('a', $title);
            $a->setAttribute('href', '#');

            $li->appendChild($a);
            $ul->appendChild($li);
        }

        // add as sibling of last card added
        $card->parentNode->appendChild($ul);
    }
    */

    // TODO: examine https://stackoverflow.com/questions/10703057/wrap-all-html-tags-between-h3-tag-sets-with-domdocument-in-php

    $content = $dom->saveHTML();
    return $content;
}

 

Link to comment
Share on other sites

I'm not sure how WP handles the oEmbeds any more, but I'd say try firing your hook later. The third parameter to add_filter is a priority - set it to like 100 or something high. It's possible that somehow what you're doing is interfering with however WP handles oEmbeds in the content.

Link to comment
Share on other sites

On 3/30/2022 at 12:59 PM, maxxd said:

I'm not sure how WP handles the oEmbeds any more, but I'd say try firing your hook later. The third parameter to add_filter is a priority - set it to like 100 or something high. It's possible that somehow what you're doing is interfering with however WP handles oEmbeds in the content.

I tried 100 (it was already 50) - no difference.

For clarity, I have separated out these pieces of DOMDocument-dependent code, packaging them up as plugins released on GitHub...

  • WP Bootstrapify: "Optimise WordPress post HTML elements on-the-fly for enhanced Bootstrap requirements."
  • WP Post Segmenter: "Turn post H3 segments into <sections> and Bootstrap cards for impactful reader presentation."

Things that are true to say...

  • Re: the frame embed getting eliminated... I think the fault was the code of WP Bootstrapify. This aims to wrap and add new classes to <blockquote>. As embedded tweets include a fallback to <blockquote>, they were getting wrapped in my new class,  <figure>, and having additional classes added, as though they were any other <blockquote>. If I include an override to ignore items with class .tweet-embed, the Twitter frames stay as-is.
  • Look toward DOMDocument. If I disable both plugins, the posts appear without those awful special characters (an eye-opener as to how many are in my underlying posts?). If I test by leaving only one of them, eg. WP Post Segmenter, active...
    • The formatting problem returns...

This leads me to believe the issue must be with DOMDocument or my particular DOMDocument options, how it might be interfering with the formatting.

Anyone have any ideas?

Currently using utf8_decode($content) - I wonder if that's a problem.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.