gevans Posted December 22, 2008 Share Posted December 22, 2008 Hey guys, I'm currently using this function; function input_rte($input, $title = "", $edit = FALSE){ if('get_magic_quotes_gpc') $input = stripslashes($input); $input = mysql_real_escape_string($input); $input = str_replace($this->old_string,'',$input); if(strpos($input,'<p>') !== FALSE && strpos($input,'</p>') !== FALSE) $input = str_replace($this->ie_fix1,$this->ie_fix2,$input); while(substr($input,-4) == '<br>' || substr($input,-5) == '<br/>' || substr($input,-6) == '<br />'){ if(substr($input,-4 == '<br>')) $input = substr($input,0,-4); elseif(substr($input,-5 == '<br/>')) $input = substr($input,0,-5); elseif(substr($input,-6 == '<br />')) $input = substr($input,0,-6); } $input = str_replace('<br><br><br><br>','<br><br>',$input); $input = str_replace('<br>','<br />',$input); if(strpos($input, '<img') !== FALSE && strpos($input, '<img alt') === FALSE && strpos($input, '<img alt') === FALSE) $input = str_replace("<img","<img alt=\"$title - content image\"",$input); if(strpos($input, '.jpg"><img') !== FALSE) $input = str_replace(".jpg\"><img",".jpg\"><img class=\"second\"",$input); $input = trim($input); $input = urlencode($input); $input = ($input == "") ? NULL : $input; return $input; } to sort some POST data from a rich text editor. The next adition to the function is required to search for a link to a pdf. The html would look like this; <a href="http://www.mydomain.com/dir/thefile.pdf">Read More</a> Now what I'm trying to do (that I can't manage at the moment) is to find that bit of code and checking if it is a .pdf extension. If it is The html needs to be replaced with the following; <div class="pdf"> <a href="the link in here" target="_blank" title="link text in here"><img class="left" src="images/pdf_download.png" alt="Download PDF" width="64" height="74" /></a> <span class="title">title in here</span> <span class="info">download pdf</span> <a href="the link in here" target="_blank" title="link text in here" class="link">DOWNLOAD</a> </div><div class="pdf-bot2"></div> I'm not entirely sure the best way to do this. I was going to use a regular expression and preg replace. Anyone got any ideas what else i could do? Quote Link to comment https://forums.phpfreaks.com/topic/138025-string-search-and-replace/ Share on other sites More sharing options...
MadTechie Posted December 22, 2008 Share Posted December 22, 2008 Do you mean something like this <?php $html = 'this is some stuff <a href="http://www.mydomain.com/dir/thefile.pdf">Read More</a> for update dating the <a href="http://youdomain.com/another.pdf">Other Stuff</a>html'; $html = preg_replace('%<a href=(["\'])(.*?\.pdf)\1>(.*)</a>%sim', "<div class=\"pdf\">\r\n<a href=\"the link in here\" target=\"_blank\" title=\"link text in here\"><img class=\"left\" src=\"images/pdf_download.png\" alt=\"Download PDF\" width=\"64\" height=\"74\" /></a>\r\n<span class=\"title\">title in here</span>\r\n<span class=\"info\">download pdf</span>\r\n<a href=\"\2\" target=\"_blank\" title=\"\3\" class=\"link\">DOWNLOAD</a>\r\n</div><div class=\"pdf-bot2\"></div>", $html ); echo $html; ?> Quote Link to comment https://forums.phpfreaks.com/topic/138025-string-search-and-replace/#findComment-721747 Share on other sites More sharing options...
gevans Posted December 23, 2008 Author Share Posted December 23, 2008 That looks perfect, haven't had a chance to test yet but will get you an update in an hour or so Cheers Quote Link to comment https://forums.phpfreaks.com/topic/138025-string-search-and-replace/#findComment-722274 Share on other sites More sharing options...
gevans Posted December 23, 2008 Author Share Posted December 23, 2008 It's nearly there, it outputs the following <div class="pdf"> <a href="the link in here" target="_blank" title="link text in here"><img class="left" src="images/pdf_download.png" alt="Download PDF" width="64" height="74" /></a> <span class="title">title in here</span> <span class="info">download pdf</span> <a href="" target="_blank" title="" class="link">DOWNLOAD</a> </div><div class="pdf-bot2"></div> The first a href needs to be a link (not just text) and the link and title use those strange characters in place of the regular expression caught text Quote Link to comment https://forums.phpfreaks.com/topic/138025-string-search-and-replace/#findComment-722365 Share on other sites More sharing options...
MadTechie Posted December 23, 2008 Share Posted December 23, 2008 heres a quick update <?php $html = 'this is some stuff <a href="http://www.mydomain.com/dir/thefile.pdf">Read More</a> for update dating the <a href="http://youdomain.com/another.pdf">Other Stuff</a>html'; $html = preg_replace('%<a href=(["\'])(.*?\.pdf)\1>(.*)</a>%sim', "<div class=\"pdf\">\r\n<a href=\"\2\" target=\"_blank\" title=\"\3\"><img class=\"left\" src=\"images/pdf_download.png\" alt=\"Download PDF\" width=\"64\" height=\"74\" /></a>\r\n<span class=\"title\">\3</span>\r\n<span class=\"info\">download pdf</span>\r\n<a href=\"\2\" target=\"_blank\" title=\"\3\" class=\"link\">DOWNLOAD</a>\r\n</div><div class=\"pdf-bot2\"></div>", $html ); echo $html; ?> what input are you using ? and what do you expect to see ? Quote Link to comment https://forums.phpfreaks.com/topic/138025-string-search-and-replace/#findComment-722393 Share on other sites More sharing options...
gevans Posted December 23, 2008 Author Share Posted December 23, 2008 I've got it sorted, the numbers representing the regular expression strings need a double back slash so; \\2 rather than \2 Thanks for all your help. Quote Link to comment https://forums.phpfreaks.com/topic/138025-string-search-and-replace/#findComment-722401 Share on other sites More sharing options...
effigy Posted December 23, 2008 Share Posted December 23, 2008 <pre> <?php $html = 'this is some stuff <a href="http://www.mydomain.com/dir/thefile.pdf">Read More</a> for updating the <a href="http://youdomain.com/another.pdf">Other Stuff</a>html'; $replace = <<<REPLACE <div class="pdf"> <a href="$2" target="_blank" title="$3"> <img class="left" src="images/pdf_download.png" alt="Download PDF" width="64" height="74" /> </a> <span class="title">$3</span> <span class="info">download pdf</span> <a href="$2" target="_blank" title="$3" class="link">DOWNLOAD</a> </div> <div class="pdf-bot2"></div> REPLACE; $html = preg_replace( '%<a href=([\'"])?((?(1).+?|[^\s>]+)\.pdf)(?(1)\1)>(.*?)</a>%si', $replace, $html ); echo htmlspecialchars($html); ?> </pre> Quote Link to comment https://forums.phpfreaks.com/topic/138025-string-search-and-replace/#findComment-722405 Share on other sites More sharing options...
gevans Posted December 23, 2008 Author Share Posted December 23, 2008 Tried it with more complex input, didn't work as expected; input html <img alt="Pupil Launch - content image" src="http://thinking.uk.com/projects/ebp_cms/images/uploads/small/ks3_04.jpg"> <img src="http://thinking.uk.com/projects/ebp_cms/images/uploads/small/cbcv_03.jpg"> <div class="clearfix"></div> <br /> <strong><span style="text-decoration: underline;">What the Challenge is all about</span></strong> <br /><br /> The focus is on the website but the challenge involves a wide range of skills - research, presentation, innovative design, graphics, and project management skills are equally important. The Challenge can be used effectively to bring a work related learning element to many aspects of the curriculum including English (presentations); ICT; and Business Studies. It will also help to develop students' 'enterprise skills'. <br /><br /> The <a href="http://www.portsmouthebp.co.uk">Education Business Partnership</a> <br /><br /> <a href="http://thinking.uk.com/projects/ebp_cms/uploads/pdfs/autumn_07_1229439562_pdf.pdf">Website Challenge Entry Form</a> output html <img alt="Pupil Launch - content image" src="http://thinking.uk.com/projects/ebp_cms/images/uploads/small/ks3_04.jpg"><img src="http://thinking.uk.com/projects/ebp_cms/images/uploads/small/cbcv_03.jpg"><div class="clearfix"></div><br /><strong><span style="text-decoration: underline;">What the Challenge is all about</span></strong><br /><br />The focus is on the website but the challenge involves a wide range of skills - research, presentation, innovative design, graphics, and project management skills are equally important. The Challenge can be used effectively to bring a work related learning element to many aspects of the curriculum including English (presentations); ICT; and Business Studies. It will also help to develop students' 'enterprise skills'.<br /><br />The <div class="pdf"> <a href="http://www.portsmouthebp.co.uk">Education Business Partnership</a><br /><br /><a href="http://thinking.uk.com/projects/ebp_cms/uploads/pdfs/autumn_07_1229439562_pdf.pdf" target="_blank" title="Website Challenge Entry Form"> <img class="left" src="images/pdf_download.png" alt="Download PDF" width="64" height="74" /> </a> <span class="title">Website Challenge Entry Form</span> <span class="info">download pdf</span> <a href="http://www.portsmouthebp.co.uk">Education Business Partnership</a><br /><br /><a href="http://thinking.uk.com/projects/ebp_cms/uploads/pdfs/autumn_07_1229439562_pdf.pdf" target="_blank" title="Website Challenge Entry Form" class="link">DOWNLOAD</a> </div> <div class="pdf-bot2"></div> excpected output <img alt="Pupil Launch - content image" src="http://thinking.uk.com/projects/ebp_cms/images/uploads/small/ks3_04.jpg"><img src="http://thinking.uk.com/projects/ebp_cms/images/uploads/small/cbcv_03.jpg"><div class="clearfix"></div><br /><strong><span style="text-decoration: underline;">What the Challenge is all about</span></strong><br /><br />The focus is on the website but the challenge involves a wide range of skills - research, presentation, innovative design, graphics, and project management skills are equally important. The Challenge can be used effectively to bring a work related learning element to many aspects of the curriculum including English (presentations); ICT; and Business Studies. It will also help to develop students' 'enterprise skills'.<br /><br /> The <a href="http://www.portsmouthebp.co.uk">Education Business Partnership</a><br /><br /> <div class="pdf"> <a href="http://thinking.uk.com/projects/ebp_cms/uploads/pdfs/autumn_07_1229439562_pdf.pdf" target="_blank" title="Website Challenge Entry Form"> <img class="left" src="images/pdf_download.png" alt="Download PDF" width="64" height="74" /> </a> <span class="title">Website Challenge Entry Form</span> <span class="info">download pdf</span> <a href="http://www.portsmouthebp.co.uk">Education Business Partnership</a><br /><br /><a href="http://thinking.uk.com/projects/ebp_cms/uploads/pdfs/autumn_07_1229439562_pdf.pdf" target="_blank" title="Website Challenge Entry Form" class="link">DOWNLOAD</a> </div> <div class="pdf-bot2"></div> Quote Link to comment https://forums.phpfreaks.com/topic/138025-string-search-and-replace/#findComment-722438 Share on other sites More sharing options...
effigy Posted December 23, 2008 Share Posted December 23, 2008 %<a href=([\'"])?((??!\1).)+\.pdf)(?(1)\1)>(.*?)</a>%si Quote Link to comment https://forums.phpfreaks.com/topic/138025-string-search-and-replace/#findComment-722445 Share on other sites More sharing options...
gevans Posted December 23, 2008 Author Share Posted December 23, 2008 That worked perfectly. Any chance of a brief breakdown of what the regular expression is doing? Actually, I know what it is doing, but an explanation of what part does what... Quote Link to comment https://forums.phpfreaks.com/topic/138025-string-search-and-replace/#findComment-722454 Share on other sites More sharing options...
effigy Posted December 23, 2008 Share Posted December 23, 2008 Actually, it needs another tweak in case the attributes are not quoted: %<a href=([\'"])?((??!\1)[^>\s])+\.pdf)(?(1)\1)>(.*?)</a>%si The first non-literal part of the regex looks for a single or double quote, which may not exist at all. Afterwards, it captures one character (that is not whitespace or ">") at a time, but only if it does not encounter the (optional) quote that it began with. In other words, if a single quote was found, match all of its contents up to the ending single quote; the same goes if a double quote was matched. If nothing was found, it stops at the end of the tag. It then backtracks to make sure the URL ends with ".pdf", matches the ending quote if one was found, the end of the tag, the rest of the content up to "</a>", and then "</a>" itself. Keep in mind that this regex only works if no other attributes are present and the formatting is exact. Here's a technical breakdown: NODE EXPLANATION ---------------------------------------------------------------------- <a href= '<a href=' ---------------------------------------------------------------------- ( group and capture to \1 (optional (matching the most amount possible)): ---------------------------------------------------------------------- ['"] any character of: ''', '"' ---------------------------------------------------------------------- )? end of \1 (NOTE: because you're using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \1) ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- (?: group, but do not capture (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- \1 what was matched by capture \1 ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- [^>\s] any character except: '>', whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- )+ end of grouping ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- pdf 'pdf' ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- (?(1) if back-reference \1 matched, then: ---------------------------------------------------------------------- \1 what was matched by capture \1 ---------------------------------------------------------------------- | else: ---------------------------------------------------------------------- succeed ---------------------------------------------------------------------- ) end of conditional on \1 ---------------------------------------------------------------------- > '>' ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- .*? any character (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- </a> '</a>' ---------------------------------------------------------------------- Quote Link to comment https://forums.phpfreaks.com/topic/138025-string-search-and-replace/#findComment-722480 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.