asmith Posted March 23, 2012 Share Posted March 23, 2012 Hello, I'm getting entire site content and trying to replace old urls with new ones using this: <?php $urlin = array( "'somethingFile\?act=([a-zA-Z0-9\-]+);(.+)'", "'somethingFile\?act=([a-zA-Z0-9\-]+)'", ); $urlout = array( "somethingNew/\\1/?\\2", "somethingNew/\\1/", ); echo preg_replace($urlin, $urlout, $temp); ?> This works almost fine except that if I have 2 links in one line in html content: <a href="somethingFile?act=someAct;var=val"></a><a href="somethingFile?act=someAct;var=val"></a> The first link in the output gets replaced fine, but the second fails (gets matched with the second array value): <a href="somethingNew/someAct/?var=val"></a><a href="somethingNew/;var=val"></a> But if I split my string into 2 lines, all works fine: <a href="somethingFile?act=someAct;var=val"></a> <a href="somethingFile?act=someAct;var=val"></a> Quote Link to comment Share on other sites More sharing options...
ragax Posted March 23, 2012 Share Posted March 23, 2012 Hi asmith, the problem is not that the second link gets matched by the second regex. (If you want to see that, eliminate the second regex: you will get the same output.) The problem is your second greedy plus quantifier. Your second plus matches everything up to the end of the string, so that your Group 2 capture actually is: var=val"></a><a href="somethingFile?act=someAct;var=val"></a> At that stage, after the first replacement, the whole string has been matched, so there is nothing left for the regex engine to match. This is a classic problem (you will find it explained in detail on this page of mine about various kinds of greedy and lazy regex matching). There are three basic solutions: - making the second plus quantifier lazy so that it only expands until the first end of string or tag marker is found (adding a question mark to the + sign) - changing the character class so that it cannot expand beyond the first end quote (using a negative character class, e.g. [^"] - the easiest: not capturing Group 2 at all, because who cares... At this stage, you are just replacing the semi-colon with a question mark, right? So you can stop. To take care of your two regexes in one single match, I suggest this: Input: <a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y"></a> Code: <?php $string='<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y"></a>'; $regex=',somethingFile\?act=([^"]+),'; $output=preg_replace_callback($regex,function($m){return 'somethingNew/'.str_replace(';','?',$m[1]);},$string); echo htmlentities($output).'<br />'; ?> Output: <a href="somethingNew/Act_One"></a><a href="somethingNew/Act2?var=X"></a><a href="somethingNew/Act3?var=Y"></a> This solution assumes there is only one variable (aside from act) in each url, conforming to your sample, i.e. not "?act=1;v1=x;v2=y". If you need multiple variables, it's a simple modification, just let me know. I may have missed something, so please let me know if I did or if you have any questions. Wishing you a fun weekend. [Edit: added "disclaimer" about the "?act=1;v1=x;v2=y" situation.] Quote Link to comment Share on other sites More sharing options...
ragax Posted March 23, 2012 Share Posted March 23, 2012 Just in case someone is interested: 1. Multi-Variable Variation (taking care of both regexes, as in the first post) Input: <a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y;var2=Z"></a> Code: <?php $string='<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y;var2=Z"></a>'; $regex=',somethingFile\?act=([^;"]+)(?,'; $output=preg_replace_callback($regex,function($m){return 'somethingNew/'.$m[1].(isset($m[2])?'/?':'/');},$string); echo htmlentities($output).'<br />'; ?> Output: <a href="somethingNew/Act_One/"></a><a href="somethingNew/Act2/?var=X"></a><a href="somethingNew/Act3/?var=Y;var2=Z"></a> 2. Basic option without callback (only for the first regex) In the first post, I didn't give a code example of the "three basic solutions" if you just wanted to fix the first regex (as the solution I proceeded to give rolled your two regexes into one). But if you were interested, here's one possibility among many (along the lines of option #3 I was mentioning). $string='<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y;var2=Z"></a>'; $regex=',somethingFile\?act=([^;"]+);,'; $replace='somethingNew/\\1?'; $output=preg_replace($regex,$replace,$string); echo htmlentities($output).'<br />'; Output: <a href="somethingFile?act=Act_One"></a><a href="somethingNew/Act2?var=X"></a><a href="somethingNew/Act3?var=Y;var2=Z"></a> Naturally, the first url is not replaced (it would be a target for the second regex). Quote Link to comment Share on other sites More sharing options...
asmith Posted March 24, 2012 Author Share Posted March 24, 2012 Thanks for you detailed reply!! I've already started reading your tutorial. Quote Link to comment Share on other sites More sharing options...
ragax Posted March 24, 2012 Share Posted March 24, 2012 And thank you very much for letting me know about a typo on the tut, asmith! Nothing more precious than a careful reader. Wishing you a fun weekend. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.