[SOLVED] stip out CDATA Tag

e11rof · October 2, 2009

I am sure somebody will tell me that I am in the wrong forum. Anyway, I wonder if somebody can help me.

I have a RSS feed fed from a Myslq table. I want to display the same feed as a Html page. The feed has some embedded tags and hence has a CDATA tag. ( as shown) I have read the Mysql record and have tried to extract the CDATA start and end tags, but failed. Any ideas? The string is shown below.

<![CDATA[<p>We would like to announce that our speaker for the October dinner will be Mr Jones. Mr Jones is an outward bound leader and a teacher.</p>

<p>Would anybody not coming please inform Steve.</p>]]>

How can I strip the CDATA tag and the end tag.

cags · October 2, 2009

Assuming I understood your intension correctly. You could do it with regular expressions like so...

preg_match("/^<!\[CDATA\[(.*)\]\]>$/s", $src, $out);
$output = $out[1];

nrg_alpha · October 2, 2009

If I understand correctly, you want <![CDATA[ and ]]> gone, but leave the rest in place? If so, perhaps something along the lines of:

$str = <<<EOF
<![CDATA[<p>We would like to announce that our speaker for the October dinner will be Mr Jones. Mr Jones is an outward bound leader and a teacher.</p>
<p>Would anybody not coming please inform Steve.</p>]]>
EOF;

$str = preg_replace('#<!\[CDATA\[(.+?)\]\]>#s', '$1', $str);
echo $str;

Hopefully I understand the end desired goal here. If not, my apologies.

@cags.. be careful with the use of .*, as you *might* run into trouble if there are multiple instances of <![CDATA[ ... ]]> in the source code / string. You can read up about why stuff like .* and .+ are 'generally' bad ideas here (post #11 and #14). If there is only one chunk of CDATA, then it's all good.. but if not, you might end up wiping out more than you bargained for.

cags · October 2, 2009

Thanks for the info nrg_alpha, I'll try to keep that in mind. I only started learning Regular Expressions at the start of the week, it took my a while to work out I needed the s at the end as the darn string had a newline char in it.

I must admit I did pretty much assume that there would only be one instance of <![CDATA, and nearly suggested simply using substr to grab everything but the start and end tag. With that in mind I think I got the regex to do what I wanted it to do, which is a minor miracle in itself.

e11rof · October 3, 2009

Thanks for that I did not have the $1 in the preg_match statement.

salathe · October 3, 2009

I know this topic is marked as SOLVED already, and that manually playing with the XML will get the job done. However, when working with XML documents, it would be advisable to use a proper XML parser (there are a number of different approaches in PHP). Using one would make this CDATA problem a non-issue since the parsers will properly handle that type of XML node.

Sign In

[SOLVED] stip out CDATA Tag

Recommended Posts

e11rof

Link to comment

Share on other sites

cags

Link to comment

Share on other sites

nrg_alpha

Link to comment

Share on other sites

cags

Link to comment

Share on other sites

e11rof

Link to comment

Share on other sites

salathe

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information