Jump to content

[SOLVED] stip out CDATA Tag


e11rof

Recommended Posts

I am sure somebody will tell me that I am in the wrong forum. Anyway, I wonder if somebody can help me.

 

I have a RSS feed fed from a Myslq table. I want to display the same feed as a Html page. The feed has some embedded tags and hence has a CDATA tag. ( as shown) I have read the Mysql record and have tried to extract the CDATA start and end tags, but failed. Any ideas? The string is shown below.

 

<![CDATA[<p>We would like to announce that our speaker for the October dinner will be Mr Jones. Mr Jones is an outward bound leader and a teacher.</p>

<p>Would anybody not coming please inform Steve.</p>]]>

 

How can I strip the CDATA tag and the end tag.

Link to comment
Share on other sites

If I understand correctly, you want <![CDATA[ and ]]> gone, but leave the rest in place? If so, perhaps something along the lines of:

 

$str = <<<EOF
<![CDATA[<p>We would like to announce that our speaker for the October dinner will be Mr Jones. Mr Jones is an outward bound leader and a teacher.</p>
<p>Would anybody not coming please inform Steve.</p>]]>
EOF;

$str = preg_replace('#<!\[CDATA\[(.+?)\]\]>#s', '$1', $str);
echo $str;

 

Hopefully I understand the end desired goal here. If not, my apologies.

 

@cags.. be careful with the use of .*, as you *might* run into trouble if there are multiple instances of <![CDATA[ ... ]]> in the source code / string. You can read up about why stuff like .* and .+ are 'generally' bad ideas here (post #11 and #14). If there is only one chunk of CDATA, then it's all good.. but if not, you might end up wiping out more than you bargained for.

Link to comment
Share on other sites

Thanks for the info nrg_alpha, I'll try to keep that in mind. I only started learning Regular Expressions at the start of the week, it took my a while to work out I needed the s at the end as the darn string had a newline char in it.

 

I must admit I did pretty much assume that there would only be one instance of <![CDATA, and nearly suggested simply using substr to grab everything but the start and end tag. With that in mind I think I got the regex to do what I wanted it to do, which is a minor miracle in itself. :)

Link to comment
Share on other sites

I know this topic is marked as SOLVED already, and that manually playing with the XML will get the job done. However, when working with XML documents, it would be advisable to use a proper XML parser (there are a number of different approaches in PHP).  Using one would make this CDATA problem a non-issue since the parsers will properly handle that type of XML node.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.