dmarquard Posted April 11, 2008 Share Posted April 11, 2008 I'm trying to write a script that temporarily renames all current links in the page (don't ask). I'm using preg_quote within preg_replace for find and replace all instances because, for whatever reason, str_ireplace (which I'd honestly rather use) would replace only half of the instances in the code before giving up. I think my code looks fine, but the page won't even load, so I guess I screwed up somewhere. Let me know if you think you spot a flaw. Alternatively, you can also tell me why you think str_ireplace only replaced half of all instances in the code. $url_source = preg_replace('/' . preg_quote('href="#') . '/', 'preg_replace_url_anchor', $url_source); // Encode anchors. $url_source = preg_replace('/' . preg_quote('href=""') . '/', 'preg_replace_url_null', $url_source); // Encode null links. $url_source = preg_replace('/' . preg_quote('href="http://') . '/', 'preg_replace_url_http', $url_source); // Encode existing HTTP links. $url_source = preg_replace('/' . preg_quote('href="https://') . '/', 'preg_replace_url_https', $url_source); // Encode existing HTTPS links. $url_source = preg_replace('/' . preg_quote('href="ftp://') . '/', 'preg_replace_url_ftp', $url_source); // Encode existing FTP links. Thanks! Link to comment https://forums.phpfreaks.com/topic/100650-solved-using-preg_quote-in-preg_replace-crashes-script/ Share on other sites More sharing options...
effigy Posted April 11, 2008 Share Posted April 11, 2008 It looks OK to me. There's no need to use preg_quote when entering the pattern into your code--this is mainly used to sanitize user input and/or other variables that may change. What was leftover that str_ireplace missed? Link to comment https://forums.phpfreaks.com/topic/100650-solved-using-preg_quote-in-preg_replace-crashes-script/#findComment-514792 Share on other sites More sharing options...
dmarquard Posted April 15, 2008 Author Share Posted April 15, 2008 It looks OK to me. There's no need to use preg_quote when entering the pattern into your code--this is mainly used to sanitize user input and/or other variables that may change. What was leftover that str_ireplace missed? I'm using it because it's cleaner than escaping EVERY special character (VERY unclean). str_replace would just crap out half way into the replacements...it would just stop and links would be left unchanged. I have yet to find a solution to this... Link to comment https://forums.phpfreaks.com/topic/100650-solved-using-preg_quote-in-preg_replace-crashes-script/#findComment-517576 Share on other sites More sharing options...
effigy Posted April 15, 2008 Share Posted April 15, 2008 If you change the delimiter none of the characters need to be escaped. Please provide some code and data that shows where str_ireplace succeeded and failed. Link to comment https://forums.phpfreaks.com/topic/100650-solved-using-preg_quote-in-preg_replace-crashes-script/#findComment-517607 Share on other sites More sharing options...
dmarquard Posted April 16, 2008 Author Share Posted April 16, 2008 If you change the delimiter none of the characters need to be escaped. Please provide some code and data that shows where str_ireplace succeeded and failed. I'm not sure how to change the delimiter, and I can't seem to find any sort of documentation referencing it. One user on php.net commented that str_replace and str_ireplace seem to stop replacing at approximately 16K (with a string 35K or larger). I'd prefer to stick with str_ireplace, but this is just getting ridiculous. I can't find my original str_ireplace code, but this was what I threw together just now. It's not replacing at all... // Grab the submission page's raw, unformatted source code. $url_source = file_get_contents($url_submission); // Parse the URL into a base URL (WITHOUT the trailing slash), just in case the webpage uses relative URL references. $url_submission_scheme = parse_url($url_submission, PHP_URL_SCHEME); $url_submission_host = parse_url($url_submission, PHP_URL_HOST); $url_submission_base = $url_submission_scheme . '://' . $url_submission_host; // Only the links on pages which DON'T define a base URL should be modified. if (!(preg_match('/' . preg_quote('<base href="') . '/', $url_source))) { // Modifier to <a href="/path/to/dir.php">. $url_source = str_ireplace(preg_quote('href="/'), 'href="' . $url_submission_base . '/', $url_source); // OK, now comes the weird part...we actually have to exclude exceptions by temporarily masking them during the conversion. $url_source = str_ireplace(preg_quote('href="#'), 'str_ireplace_url_anchor', $url_source); // Encode anchors. $url_source = str_ireplace(preg_quote('href=""'), 'str_ireplace_url_null', $url_source); // Encode null links. $url_source = str_ireplace(preg_quote('href="http://'), 'str_ireplace_url_http', $url_source); // Encode existing HTTP links. $url_source = str_ireplace(preg_quote('href="https://'), 'str_ireplace_url_https', $url_source); // Encode existing HTTPS links. $url_source = str_ireplace(preg_quote('href="ftp://'), 'str_ireplace_url_ftp', $url_source); // Encode existing FTP links. // Mask all known program protocol links. $url_source = str_ireplace(preg_quote('href="javascript:'), 'str_ireplace_url_js', $url_source); // Encode javascript links. $url_source = str_ireplace(preg_quote('href="mailto:'), 'str_ireplace_url_mailto', $url_source); // Encode email links. $url_source = str_ireplace(preg_quote('href="aim:'), 'str_ireplace_url_aim', $url_source); // Encode AIM links. $url_source = str_ireplace(preg_quote('href="callto:'), 'str_ireplace_url_callto', $url_source); // Encode Skype links. // Now that we've temporarily masked all link exceptions, we can rename ALL remaining links. $url_source = str_ireplace(preg_quote('href="'), 'href="' . $url_submission_base . '/', $url_source); // Time to unmask our temporarily renamed links. $url_source = str_ireplace(preg_quote('str_ireplace_url_anchor'), 'href="#', $url_source); // Decode anchors. $url_source = str_ireplace(preg_quote('str_ireplace_url_null'), 'href=""', $url_source); // Decode null links. $url_source = str_ireplace(preg_quote('str_ireplace_url_http'), 'href="http://', $url_source); // Decode existing HTTP links. $url_source = str_ireplace(preg_quote('str_ireplace_url_https'), 'href="https://', $url_source); // Decode existing HTTPS links. $url_source = str_ireplace(preg_quote('str_ireplace_url_ftp'), 'href="ftp://', $url_source); // Decode existing FTP links. // ...and all program protocal addresses. $url_source = str_ireplace(preg_quote('str_ireplace_url_js'), 'href="javascript:', $url_source); // Decode javascript links. $url_source = str_ireplace(preg_quote('str_ireplace_url_mailto'), 'href="mailto:', $url_source); // Decode email links. $url_source = str_ireplace(preg_quote('str_ireplace_url_aim'), 'href="aim:', $url_source); // Decode AIM links. $url_source = str_ireplace(preg_quote('str_ireplace_url_callto'), 'href="callto:', $url_source); // Decode Skype links.*/ } // Since base URLs have no effect on other paths, make all calls absolute. // Correct all image paths. $url_source = str_ireplace(preg_quote('src="/'), 'src="' . $url_submission_base . '/', $url_source); // Encode all existing absolute image references. $url_source = str_ireplace(preg_quote('src="http://'), 'str_ireplace_img_http', $url_source); // Encode HTTP image references. $url_source = str_ireplace(preg_quote('src="https://'), 'str_ireplace_img_https', $url_source); // Encode HTTPS image references. // Now, make all relative image references NOT proceeded by a '/' absolute. $url_source = str_ireplace(preg_quote('src="'), 'src="' . $url_submission_base . '/', $url_source); // Decode our maked image references. $url_source = str_ireplace(preg_quote('str_ireplace_img_http'), 'src="http://', $url_source); // Decode HTTP image references. $url_source = str_ireplace(preg_quote('str_ireplace_img_https'), 'src="https://', $url_source); // Decode HTTPS image references. // Horrible stylesheet include function. $url_source = str_ireplace(preg_quote('@import "/'), '@import "' . $url_submission_base . '/', $url_source); Maybe there's a more efficient way of doing this, but I don't know what it is. Link to comment https://forums.phpfreaks.com/topic/100650-solved-using-preg_quote-in-preg_replace-crashes-script/#findComment-518470 Share on other sites More sharing options...
effigy Posted April 16, 2008 Share Posted April 16, 2008 Delimiters. preg_quote should only be used in conjunction with preg_* functions. It's not needed for str_ireplace and will botch things up. I still haven't seen the data this is failing on... Link to comment https://forums.phpfreaks.com/topic/100650-solved-using-preg_quote-in-preg_replace-crashes-script/#findComment-518544 Share on other sites More sharing options...
dmarquard Posted April 17, 2008 Author Share Posted April 17, 2008 Delimiters. preg_quote should only be used in conjunction with preg_* functions. It's not needed for str_ireplace and will botch things up. I still haven't seen the data this is failing on... I reverted back to str_ireplace. The code below is my script to convert relative URLs to absolute URLs: // Grab the submission page's raw, unformatted source code. $url_source = file_get_contents($url_submission); // Parse the URL into a base URL (WITHOUT the trailing slash), just in case the webpage uses relative URL references. $url_submission_scheme = parse_url($url_submission, PHP_URL_SCHEME); $url_submission_host = parse_url($url_submission, PHP_URL_HOST); $url_submission_base = $url_submission_scheme . '://' . $url_submission_host; // Only the links on pages which DON'T define a base URL should be modified. if (!(preg_match('/' . preg_quote('<base href="') . '/', $url_source))) { // Modifier to <a href="/path/to/dir.php">. $url_source = str_ireplace('href="/', 'href="' . $url_submission_base . '/', $url_source); // OK, now comes the weird part...we actually have to exclude exceptions by temporarily masking them during the conversion. $url_source = str_ireplace('href="#', 'str_ireplace_url_anchor', $url_source); // Encode anchors. $url_source = str_ireplace('href=""', 'str_ireplace_url_null', $url_source); // Encode null links. $url_source = str_ireplace('href="http://', 'str_ireplace_url_http', $url_source); // Encode existing HTTP links. $url_source = str_ireplace('href="https://', 'str_ireplace_url_https', $url_source); // Encode existing HTTPS links. $url_source = str_ireplace('href="ftp://', 'str_ireplace_url_ftp', $url_source); // Encode existing FTP links. // Mask all known program protocol links. $url_source = str_ireplace('href="javascript:', 'str_ireplace_url_js', $url_source); // Encode javascript links. $url_source = str_ireplace('href="mailto:', 'str_ireplace_url_mailto', $url_source); // Encode email links. $url_source = str_ireplace('href="aim:', 'str_ireplace_url_aim', $url_source); // Encode AIM links. $url_source = str_ireplace('href="callto:', 'str_ireplace_url_callto', $url_source); // Encode Skype links. // Now that we've temporarily masked all link exceptions, we can rename ALL remaining links. $url_source = str_ireplace('href="', 'href="' . $url_submission_base . '/', $url_source); // Time to unmask our temporarily renamed links. $url_source = str_ireplace('str_ireplace_url_anchor', 'href="#', $url_source); // Decode anchors. $url_source = str_ireplace('str_ireplace_url_null', 'href=""', $url_source); // Decode null links. $url_source = str_ireplace('str_ireplace_url_http', 'href="http://', $url_source); // Decode existing HTTP links. $url_source = str_ireplace('str_ireplace_url_https', 'href="https://', $url_source); // Decode existing HTTPS links. $url_source = str_ireplace('str_ireplace_url_ftp', 'href="ftp://', $url_source); // Decode existing FTP links. // ...and all program protocal addresses. $url_source = str_ireplace('str_ireplace_url_js', 'href="javascript:', $url_source); // Decode javascript links. $url_source = str_ireplace('str_ireplace_url_mailto', 'href="mailto:', $url_source); // Decode email links. $url_source = str_ireplace('str_ireplace_url_aim', 'href="aim:', $url_source); // Decode AIM links. $url_source = str_ireplace('str_ireplace_url_callto', 'href="callto:', $url_source); // Decode Skype links.*/ } // Since base URLs have no effect on other paths, make all calls absolute. // Correct all image paths. $url_source = str_ireplace('src="/', 'src="' . $url_submission_base . '/', $url_source); // Encode all existing absolute image references. $url_source = str_ireplace('src="http://', 'str_ireplace_img_http', $url_source); // Encode HTTP image references. $url_source = str_ireplace('src="https://', 'str_ireplace_img_https', $url_source); // Encode HTTPS image references. // Now, make all relative image references NOT proceeded by a '/' absolute. $url_source = str_ireplace('src="', 'src="' . $url_submission_base . '/', $url_source); // Decode our maked image references. $url_source = str_ireplace('str_ireplace_img_http', 'src="http://', $url_source); // Decode HTTP image references. $url_source = str_ireplace('str_ireplace_img_https', 'src="https://', $url_source); // Decode HTTPS image references. // Horrible stylesheet include function. $url_source = str_ireplace('@import "/', '@import "' . $url_submission_base . '/', $url_source); // Format the submission's source code to be inserted. $url_source_formatted = mysql_real_escape_string($url_source); // Insert all data into a new record. $url_sql = 'INSERT INTO `webpages` (`id`, `url`, `source`, `creation`) VALUES (NULL, \'' . $url_submission . '\', \'' . $url_source_formatted . '\', NOW());'; mysql_query($url_sql) or die('We were unable to process your link. Please try <a href="' . $abs_url . '/mirror/submit.php">resubmitting</a>. (Error: ' . mysql_error() . ')'); echo $url_source; } } What happens is that only random links are modified (very frustrating). I'll use Google's main page as an example: <html><head><meta http-equiv="content-type" content="text/html; charset=UTF-8"><title>Google</title><style>body,td,a,p,.h{font-family:arial,sans-serif}.h{font-size:20px}.h{color:#3366cc}.q{color:#00c}.ts td{padding:0}.ts{border-collapse:collapse}.lnc:link,.lnc:visited{color:#00c}.pgtab,.pgtab:hover,.pgtabselected,.pgtabside{text-align:center;text-decoration:none;color:#00c;display:block;height:27px;float:left;overflow:hidden;background:url(/intl/ja/images/productlinktabs.png) no-repeat;padding-top:8px}.pgtab{width:130px;background-position:-274px 0}.pgtab:hover{width:130px;background-position:-144px 0}.pgtabselected{width:144px}.pgtabside{width:3px;background-position:-404px 0}.ptr{cursor:pointer;cursor:hand}.iconl{background:url() no-repeat;overflow:hidden;height:px;width:px}#gbar{float:left;height:22px;padding-left:2px}.gbh,.gb2 div{border-top:1px solid #c9d7f1;font-size:0;height:0}.gbh{position:absolute;top:24px;width:100%}.gb2 div{margin:5px}#gbi{background:#fff;border:1px solid;border-color:#c9d7f1 #36c #36c #a2bae7;font-size:13px;top:24px;z-index:1000}#guser{padding-bottom:7px !important}#gbar,#guser{font-size:13px;padding-top:1px !important}@media all{.gb1,.gb3{height:22px;margin-right:.73em;vertical-align:top}.gb2 a,.gb2 b{display:block;padding:.2em .5em}}#gbi,.gb2{display:none;position:absolute;width:8em}.gb2{z-index:1001}#gbar a{color:#00c}.gb2 a,.gb3 a{text-decoration:none}#gbar .gb2 a:hover{background:#36c;color:#fff;display:block}</style><script>window.google={kEI:"ZdkGSPT0AZjgggTswOiaCQ",kEXPI:"17259,17735",kHL:"en"}; function sf(){document.f.q.focus()} window.clk=function(b,c,d,e,f,g){if(document.images){var a=encodeURIComponent||escape;(new Image).src="http://www.google.com/url?sa=T"+(c?"&oi="+a(c):"")+(d?"&cad="+a(d):"")+"&ct="+a(e)+"&cd="+a(f)+(b?"&url="+a(b.replace(/#.*/,"")).replace(/\+/g,"%2B"):"")+"&ei=ZdkGSPT0AZjgggTswOiaCQ"+g}return true}; window.gbar={};(function(){var a=window.gbar,b,g,h;function l(c,f,e){c.display=h?"none":"block";c.left=f+"px";c.top=e+"px"}a.tg=function(c){var f=0,e=0,d,m=0,n,j=window.navExtra,k,i=document;g=g||i.getElementById("gbar").getElementsByTagName("span");(c||window.event).cancelBubble=!m;if(!b){b=i.createElement(Array.every||window.createPopup?"iframe":"DIV");b.frameBorder="0";b.scrolling="no";b.src="http://www.google.com/#";g[7].parentNode.appendChild(b).id="gbi";if(j&&g[7])for(n in j){k=i.createElement("span");k.appendChild(j[n]);g[7].parentNode.insertBefore(k,g[7]).className="gb2"}i.onclick=a.close}while(d=g[++m]){if(e){l(d.style,e+1,f+25);f+=d.firstChild.tagName=="DIV"?9:20}if(d.className=="gb3"){do e+=d.offsetLeft;while(d=d.offsetParent)}}b.style.height=f+"px";l(b.style,e,24);h=!h};a.close=function(c){h&&a.tg(c)}})();</script></head><body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b alink=#ff0000 onload="sf();if(document.images){new Image().src='/images/nav_logo3.png'}" topmargin=3 marginheight=3><div id=gbar><nobr><span class=gb1><b>Web</b></span> <span class=gb1><a href="http://images.google.com/imghp?hl=en&tab=wi">Images</a></span> <span class=gb1><a href="http://maps.google.com/maps?hl=en&tab=wl">Maps</a></span> <span class=gb1><a href="http://news.google.com/nwshp?hl=en&tab=wn">News</a></span> <span class=gb1><a href="http://www.google.com/prdhp?hl=en&tab=wf">Shopping</a></span> <span class=gb1><a href="http://mail.google.com/mail/?hl=en&tab=wm">Gmail</a></span> <span class=gb3><a href="http://www.google.com/intl/en/options/" onclick="this.blur();gbar.tg(event);return !1"><u>more</u> <small>▼</small></a></span> <span class=gb2><a href="http://video.google.com/?hl=en&tab=wv">Video</a></span> <span class=gb2><a href="http://groups.google.com/grphp?hl=en&tab=wg">Groups</a></span> <span class=gb2><a href="http://books.google.com/bkshp?hl=en&tab=wp">Books</a></span> <span class=gb2><a href="http://scholar.google.com/schhp?hl=en&tab=ws">Scholar</a></span> <span class=gb2><a href="http://finance.google.com/finance?hl=en&tab=we">Finance</a></span> <span class=gb2><a href="http://blogsearch.google.com/?hl=en&tab=wb">Blogs</a></span> <span class=gb2><div></div></a></span> <span class=gb2><a href="http://www.youtube.com/?hl=en&tab=w1">YouTube</a></span> <span class=gb2><a href="http://www.google.com/calendar/render?hl=en&tab=wc">Calendar</a></span> <span class=gb2><a href="http://picasaweb.google.com/home?hl=en&tab=wq">Photos</a></span> <span class=gb2><a href="http://docs.google.com/?hl=en&tab=wo">Documents</a></span> <span class=gb2><a href="http://www.google.com/reader/view/?hl=en&tab=wy">Reader</a></span> <span class=gb2><div></div></a></span> <span class=gb2><a href="http://www.google.com/intl/en/options/">even more »</a></span> </nobr></div><div class=gbh style=left:0></div><div class=gbh style=right:0></div><div align=right id=guser style="font-size:84%;padding:0 0 4px" width=100%><nobr><a href="http://www.google.com/url?sa=p&pref=ig&pval=3&q=http://www.google.com/ig%3Fhl%3Den%26source%3Diglk&usg=AFQjCNFA18XPfgb7dKnXfKz7x7g1GDH1tg">iGoogle</a> | <a href="http://swww.google.com/accounts/Login?continue=http://www.google.com/&hl=en">Sign in</a></nobr></div><center><br clear=all id=lgpd><img alt="Google" height=110 src="http://www.google.com/intl/en_ALL/images/logo.gif" width=276><br><br><form action="/search" name=f><table cellpadding=0 cellspacing=0><tr valign=top><td width=25%> </td><td align=center nowrap><input name=hl type=hidden value=en><input maxlength=2048 name=q size=55 title="Google Search" value=""><br><input name=btnG type=submit value="Google Search"><input name=btnI type=submit value="I'm Feeling Lucky"></td><td nowrap width=25%><font size=-2> <a href=/advanced_search?hl=en>Advanced Search</a><br> <a href=/preferences?hl=en>Preferences</a><br> <a href=/language_tools?hl=en>Language Tools</a></font></td></tr></table></form><br><br><font size=-1><a href="http://www.google.com/intl/en/ads/">Advertising Programs</a> - <a href="http://www.google.com/services/">Business Solutions</a> - <a href="http://www.google.com/intl/en/about.html">About Google</a></font><p><font size=-2>©2008 Google</font></p></center></body></html> Note that half of the relative links are properly converted, but then it just seems to drop off... Link to comment https://forums.phpfreaks.com/topic/100650-solved-using-preg_quote-in-preg_replace-crashes-script/#findComment-519168 Share on other sites More sharing options...
dmarquard Posted April 18, 2008 Author Share Posted April 18, 2008 I think I got it fixed. It turns out that the Google code that wouldn't change was formatted as <a href=page.html>. Baaaad Google. I also fixed the complete halt of replacements by upping my php_value memory_limit to 36M. Thanks for the help. Link to comment https://forums.phpfreaks.com/topic/100650-solved-using-preg_quote-in-preg_replace-crashes-script/#findComment-520285 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.