shanksta13 Posted August 16, 2009 Share Posted August 16, 2009 I've written a piece of code to allow me to pull in an RSS feed into Twitter. The reason I'm doing this is because I want the RSS feed to update almost instantly, and most of the predesigned tools to do this only refresh every 30 minutes or so. Anyway, I'm having a slight issue. When my php script runs, it's pulling the description from the RSS feed in like this: <![CDATA[ <p>I'll be posting live updates and analysis during all games this year through twitter.</p> <p> <img src="http://www.utterli.com/imgs/no-avatar-60.gif" alt="" align="left" hspace="6" /> by: FLGatorStop<br /> when: 4 min. ago<br /> </p> I need to do two things. First, I need to get rid of the CDATA part, then I need to find some way to cut the description off after the first </p> tag. Any help would be greatly appreciated. I do have a code snippet to make the front half of the CDATA tag fall off, and make the HTML disappear. So really, all I need is some code to cut the string off after the </p> tag. Thanks! Link to comment https://forums.phpfreaks.com/topic/170485-strip-html-for-rss-feed/ Share on other sites More sharing options...
kratsg Posted August 16, 2009 Share Posted August 16, 2009 How about some fancy Regex? $matches = null; $pattern = '/<p>(.*)(<\/p>)?/'; preg_match($pattern,$string,$matches); $text = $matches[1]; To explain this really quickly.. $matches[1] will return whatever is matched by the greedy operator (.*) which I made.. not-so-greedy with the (<\/p>)? at the end (meaning it will grab the text in between the two tags). Link to comment https://forums.phpfreaks.com/topic/170485-strip-html-for-rss-feed/#findComment-899315 Share on other sites More sharing options...
shanksta13 Posted August 16, 2009 Author Share Posted August 16, 2009 How about some fancy Regex? $matches = null; $pattern = '/<p>(.*)(<\/p>)?/'; preg_match($pattern,$string,$matches); $text = $matches[1]; To explain this really quickly.. $matches[1] will return whatever is matched by the greedy operator (.*) which I made.. not-so-greedy with the (<\/p>)? at the end (meaning it will grab the text in between the two tags). That regex looks like it would work properly, now I just have to figure out how to put it into the script I already have properly. If you could help with that it would be great. Here is the code snippet for the current HTML stripping (note that this does not work properly): // Strip HTML tags and other bullshit from DESCRIPTION if ($this->stripHTML && $result['items'][$i]['description']) $result['items'][$i]['description'] = strip_tags($this->unhtmlentities(strip_tags($result['items'][$i]['description']))); And here is the unhtmlentities () function: // ------------------------------------------------------------------- // Replace HTML entities &something; by real characters // ------------------------------------------------------------------- function unhtmlentities ($string) { // Get HTML entities table $trans_tbl = get_html_translation_table (HTML_ENTITIES, ENT_QUOTES); // Flip keys<==>values $trans_tbl = array_flip ($trans_tbl); // Add support for ' entity (missing in HTML_ENTITIES) $trans_tbl += array(''' => "'"); // Replace entities by values return strtr ($string, $trans_tbl); Now, I'd like to enter that regex you gave me so that it strips what's inside of the <description> tags on the RSS feed down to the text that is in between the first set of <p> tags. Link to comment https://forums.phpfreaks.com/topic/170485-strip-html-for-rss-feed/#findComment-899501 Share on other sites More sharing options...
kratsg Posted August 17, 2009 Share Posted August 17, 2009 Here is the code snippet for the current HTML stripping (note that this does not work properly): // Strip HTML tags and other bullshit from DESCRIPTION if ($this->stripHTML && $result['items'][$i]['description']) $result['items'][$i]['description'] = strip_tags($this->unhtmlentities(strip_tags($result['items'][$i]['description']))); What do you mean it doesn't work? It doesn't change from < to <? Add it as a function really: function getText($string){ $matches = null; $pattern = '/<p>(.*)(<\/p>)?/'; preg_match($pattern,$string,$matches); return $matches[1]; } This way, once you find where in the code you are holding the string that contains the HTML Output to filter, just wrap the function around that variable. Link to comment https://forums.phpfreaks.com/topic/170485-strip-html-for-rss-feed/#findComment-900156 Share on other sites More sharing options...
shanksta13 Posted August 17, 2009 Author Share Posted August 17, 2009 Okay, I added that function and called the strip function like this: if ($this->stripHTML && $result['items'][$i]['description']) $result['items'][$i]['description'] = strip_tags($this->getText(strip_tags($result['items'][$i]['description']))); However, the feed is still spitting out like this: <![CDATA[ <p>@<a class="at_lnk" href="/UtterliTeam">UtterliTeam</a> is there any way to add a title and a message from the same text message? I can do the title when sending to [email protected] and the message from [email protected]. Any way to do both in one shot?</p> <p> <img src="http://www.utterli.com/imgs/no-avatar-60.gif" alt="" align="left" hspace="6" /> by: FLGatorStop<br /> when: 7 hours ago<br /> </p> ]]> So clearly, it's pulling properly from the description tags. But I need a function that will strip out the <![CDATA[... and stuff to just leave the part inside the first <p> </p> tags. For some reason, I can't seem to figure this out. I've played around with a whole load of different configurations. Maybe I could attach the two files I'm using to push the feed to Twitter? Maybe someone could take a look and that would help explain a bit better? Link to comment https://forums.phpfreaks.com/topic/170485-strip-html-for-rss-feed/#findComment-900503 Share on other sites More sharing options...
kratsg Posted August 19, 2009 Share Posted August 19, 2009 What does the result look like after the strip_tags function? Perhaps interchanging the two functions (getText and strip_tags) will make it work. Link to comment https://forums.phpfreaks.com/topic/170485-strip-html-for-rss-feed/#findComment-902108 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.