Maverickb7 Posted April 25, 2007 Share Posted April 25, 2007 Alright, this should be pretty easy right? Well... i'm trying it so of course something has to go wrong. I have a string that contains various types of html like imgs and links. I want to remove all html from this string. Now I've tried using strip_tags() but it give something like... the string holds... <b>Ubisoft reveals it's long-speculated Clancy franchise under construction at the Shanghai studio.</b><br /> Ubisoft has revealed the long-speculated latest entry in the Tom Clancy series, EndWar, a strategy game set on the backdrop of World War III.<br /><br /><a href="http://www.computerandvideogames.com/article.php?id=162648?cid=OTC-RSS&attr=CVG-News-RSS"><img src="http://medialib.computerandvideogames.com/screens/screenshot_177640_thumb93.jpg"></a> <br /><br /><a href="http://www.computerandvideogames.com/article.php?id=162648?cid=OTC-RSS&attr=CVG-News-RSS">Click here to read the full article</a> and when I use strip_tags(addslashes($mystring)) it gives me: Riots earthquakes and pollution strike in new screens. We've got a shed-load of new screens from EA's DS outing in its classic sim series Sim City. alib.computerandvideogames.com/screens/screenshot_177546_thumb93.jpg"> m/screens/screenshot_177551_thumb93.jpg"> humb93.jpg"> p://www.computerandvideogames.com/article.php?id=162585?cid=OTC-RSS&attr=CVG-News-RSS"> Click here to read the full article what am I doing wrong? Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/ Share on other sites More sharing options...
benjaminbeazy Posted April 25, 2007 Share Posted April 25, 2007 put the original string in code tags please Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-237827 Share on other sites More sharing options...
benjaminbeazy Posted April 25, 2007 Share Posted April 25, 2007 here's what i did.... note the slashes i added in front of single quotes $string ='<b>Ubisoft reveals it\'s long-speculated Clancy franchise under construction at the Shanghai studio.</b><br />Ubisoft has revealed the long-speculated latest entry in the Tom Clancy series, EndWar, a strategy game set on the backdrop of World War III.<br /><br /><a href="http://www.computerandvideogames.com/article.php?id=162648?cid=OTC-RSS&attr=CVG-News-RSS"><img src="http://medialib.computerandvideogames.com/screens/screenshot_177640_thumb93.jpg"></a> <br /><br /><a href="http://www.computerandvideogames.com/article.php?id=162648?cid=OTC-RSS&attr=CVG-News-RSS">Click here to read the full article</a>'; $new_string = strip_tags($string); echo $new_string; if the content will dynamically change, try this... you may have to fumble around with escaping the quotes in the replace $string = str_replace("'", "\'", $string); $new_string = strip_tags($string); echo $new_string; Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-237839 Share on other sites More sharing options...
Maverickb7 Posted April 25, 2007 Author Share Posted April 25, 2007 isn't that what addslashes() does? It adds slashs in front of all the single and double quotes? And to answer you question yes the content is going to be dynamic and none of the input within the string is going to be under my control. So I have to clean it up myself after its been sent to me. Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238198 Share on other sites More sharing options...
Maverickb7 Posted April 25, 2007 Author Share Posted April 25, 2007 Here is the code I've come up with so far. Basically what I'm trying to do is take a RSS feed, grab the items, check if there in the database, and if not add them. But during that process I want to strip out all html including links, images, text styles.. ect... <?php $connection = mysql_connect("localhost", "DBuser", "$DBpass"); mysql_select_db("$DB", $connection); $counter = 0; $type = 0; $tag = ""; $itemInfo = array(); $channelInfo = array(); function opening_element($xmlParser, $name, $attribute){ global $tag, $type; $tag = $name; if($name == "CHANNEL"){ $type = 1; } else if($name == "ITEM"){ $type = 2; } }//end opening element function closing_element($xmlParser, $name){ global $tag, $type, $counter; $tag = ""; if($name == "ITEM"){ $type = 0; $counter++; } else if($name == "CHANNEL"){ $type = 0; } }//end closing_element function c_data($xmlParser, $data){ global $tag, $type, $channelInfo, $itemInfo, $counter; $data = strip_tags($data); $data = addslashes($data); if($tag == "TITLE" || $tag == "DESCRIPTION" || $tag == "LINK"){ if($type == 1){ $channelInfo[strtolower($tag)] = $data; }//end checking channel else if($type == 2){ $itemInfo[$counter][strtolower($tag)] .= $data; }//end checking for item }//end checking tag }//end cdata funct $xmlParser = xml_parser_create(); xml_parser_set_option($xmlParser, XML_OPTION_CASE_FOLDING, TRUE); xml_parser_set_option($xmlParser, XML_OPTION_SKIP_WHITE, TRUE); xml_set_element_handler($xmlParser, "opening_element", "closing_element"); xml_set_character_data_handler($xmlParser, "c_data"); $ch = curl_init(); $timeout = 5; // set to zero for no timeout curl_setopt ($ch, CURLOPT_URL, $_GET['rss']); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $fp = curl_exec($ch); curl_close($ch); $fp = split(",", $fp); foreach($fp as $line){ if(!xml_parse($xmlParser, $line)){ die("Could not parse file."); } } foreach($itemInfo as $items){ $query = mysql_query("SELECT * FROM articlefeed WHERE title = '".htmlentities($items['title'], ENT_QUOTES)."'") or die(mysql_error()); $num = mysql_num_rows($query); if($num > 0){ echo $items['title']." already exists!<br />"; } else { if (mysql_query("INSERT INTO articlefeed VALUES('', '".$items['title']."', '".htmlentities($items['description'], ENT_QUOTES)."', '".htmlentities($items['link'],ENT_QUOTES)."')") or die(mysql_error())){ echo $items['title']." was added!<br />"; } } } ?> Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238208 Share on other sites More sharing options...
Maverickb7 Posted April 25, 2007 Author Share Posted April 25, 2007 Why does html still get through strip_tags and how can I increase the accuracy of removing all html? I really need help guys. =( Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238236 Share on other sites More sharing options...
kalivos Posted April 25, 2007 Share Posted April 25, 2007 I don't know if this matters.... but instead of strip_tags(addslashes($mystring)) try addslashes(strip_tags($mystring)) Because it works going from the inside out, it might try to add the slashes first, then strip out the tags. strip_tags may or may not recognize humb93.jpg\"> to be a valid ending. Just a hunch. -Kalivos Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238246 Share on other sites More sharing options...
Maverickb7 Posted April 25, 2007 Author Share Posted April 25, 2007 I've tried that previously and it didn't help any. It still displays partial pieces of the html within the string. I was playing a little bit with the CURL area of the code and noticed that when I replace split() with strip_tags() the code is completely clean. The only problem is the script does not function if I change that. One strange thing I wanted to ask about was the split() function. I use to use this code to open csv files that had data seperated by commas. I tried to remove it within this code, but the foreach right after doesn't work then. =s How can I get around using split without killing my code? Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238256 Share on other sites More sharing options...
kalivos Posted April 25, 2007 Share Posted April 25, 2007 Split separates by regex. Try changing it out for explode(",", $fp); Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238262 Share on other sites More sharing options...
Maverickb7 Posted April 25, 2007 Author Share Posted April 25, 2007 but why would I need to use that comma? It's a rss/xml feed. Should I still use that? Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238267 Share on other sites More sharing options...
kalivos Posted April 25, 2007 Share Posted April 25, 2007 your code uses a comma, unless I'm looking at the wrong line and you have more than 1 split. $fp = split(",", $fp); Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238283 Share on other sites More sharing options...
steelmanronald06 Posted April 25, 2007 Share Posted April 25, 2007 to get rid of HTML you have to use htmlentities() function. Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238285 Share on other sites More sharing options...
kalivos Posted April 25, 2007 Share Posted April 25, 2007 That doesn't strip HTML though, it only changes it to it's counterpart so it wont be parsed. $str = "A 'quote' is <u>underlined</u>"; Outputs: A 'quote' is <u>underlined</u> Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238290 Share on other sites More sharing options...
Maverickb7 Posted April 25, 2007 Author Share Posted April 25, 2007 Yeah I know, I was originally using the CURL code to read file that has its data devided by comma's. I haven't found a way to remove the split() without killing the code. It's reading a rss feed so I dont see why the split() would be needed? I'm new to PHP and still learning so perhaps I'm missing something. Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238291 Share on other sites More sharing options...
Maverickb7 Posted April 25, 2007 Author Share Posted April 25, 2007 Like I said before. I replaced split() with strip_tags() using $fp and echoed the results and it was clean text, all HTML code was removed and everything looked great. But doing that killed the code. I think it removed the xml blocks to, not sure. But it doesn't wanna work like that. =( I've been working on this for hours and hours and can't seem to figure it out. ANY help is appreciated. Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238296 Share on other sites More sharing options...
Maverickb7 Posted April 25, 2007 Author Share Posted April 25, 2007 wow.... I finally figured out what it was. Turns out some of the feeds had already broke the string down like html_entity would, so i used html_entity_decode to decode all those first, then removed all html tags. Thanks for all your help! Link to comment https://forums.phpfreaks.com/topic/48572-striping-html-from-a-sting/#findComment-238332 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.