Jump to content

preg_replace


asmith

Recommended Posts

Hello,

 

I'm getting entire site content and trying to replace old urls with new ones using this:

 

<?php
$urlin = array(
	"'somethingFile\?act=([a-zA-Z0-9\-]+);(.+)'",
	"'somethingFile\?act=([a-zA-Z0-9\-]+)'",
);

$urlout = array(
	"somethingNew/\\1/?\\2",
	"somethingNew/\\1/",
);

echo preg_replace($urlin, $urlout, $temp);
?>

 

This works almost fine except that if I have 2 links in one line in html content:

 

<a href="somethingFile?act=someAct;var=val"></a><a href="somethingFile?act=someAct;var=val"></a>

 

The first link in the output gets replaced fine, but the second fails (gets matched with the second array value):

<a href="somethingNew/someAct/?var=val"></a><a href="somethingNew/;var=val"></a>

 

But if I split my string into 2 lines, all works fine:

<a href="somethingFile?act=someAct;var=val"></a>
<a href="somethingFile?act=someAct;var=val"></a>

Link to comment
https://forums.phpfreaks.com/topic/259556-preg_replace/
Share on other sites

Hi asmith,

 

the problem is not that the second link gets matched by the second regex.

(If you want to see that, eliminate the second regex: you will get the same output.)

 

The problem is your second greedy plus quantifier. Your second plus matches everything up to the end of the string, so that your Group 2 capture actually is:

var=val"></a><a href="somethingFile?act=someAct;var=val"></a>

 

At that stage, after the first replacement, the whole string has been matched, so there is nothing left for the regex engine to match.

 

This is a classic problem (you will find it explained in detail on this page of mine about various kinds of greedy and lazy regex matching).

 

There are three basic solutions:

- making the second plus quantifier lazy so that it only expands until the first end of string or tag marker is found (adding a question mark to the + sign)

- changing the character class so that it cannot expand beyond the first end quote (using a negative character class, e.g. [^"]

- the easiest: not capturing Group 2 at all, because who cares... At this stage, you are just replacing the semi-colon with a question mark, right? So you can stop.

 

To take care of your two regexes in one single match, I suggest this:

 

Input:

<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y"></a>

 

Code:

<?php
$string='<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y"></a>';
$regex=',somethingFile\?act=([^"]+),';
$output=preg_replace_callback($regex,function($m){return 'somethingNew/'.str_replace(';','?',$m[1]);},$string);
echo htmlentities($output).'<br />';
?>

 

Output:

<a href="somethingNew/Act_One"></a><a href="somethingNew/Act2?var=X"></a><a href="somethingNew/Act3?var=Y"></a>

 

This solution assumes there is only one variable (aside from act) in each url, conforming to your sample, i.e. not "?act=1;v1=x;v2=y". If you need multiple variables, it's a simple modification, just let me know.

 

I may have missed something, so please let me know if I did or if you have any questions.

Wishing you a fun weekend.

:)

 

 

[Edit: added "disclaimer" about the "?act=1;v1=x;v2=y" situation.]

Link to comment
https://forums.phpfreaks.com/topic/259556-preg_replace/#findComment-1330554
Share on other sites

Just in case someone is interested:

 

1. Multi-Variable Variation (taking care of both regexes, as in the first post)

 

Input:

<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y;var2=Z"></a>

 

Code:

<?php
$string='<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y;var2=Z"></a>';
$regex=',somethingFile\?act=([^;"]+)(?,';
$output=preg_replace_callback($regex,function($m){return 'somethingNew/'.$m[1].(isset($m[2])?'/?':'/');},$string);
echo htmlentities($output).'<br />';
?>

 

Output:

<a href="somethingNew/Act_One/"></a><a href="somethingNew/Act2/?var=X"></a><a href="somethingNew/Act3/?var=Y;var2=Z"></a>

 

2. Basic option without callback (only for the first regex)

In the first post, I didn't give a code example of the "three basic solutions" if you just wanted to fix the first regex (as the solution I proceeded to give rolled your two regexes into one).

But if you were interested, here's one possibility among many (along the lines of option #3 I was mentioning).

 

$string='<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y;var2=Z"></a>';
$regex=',somethingFile\?act=([^;"]+);,';
$replace='somethingNew/\\1?';
$output=preg_replace($regex,$replace,$string);
echo htmlentities($output).'<br />';

 

Output: <a href="somethingFile?act=Act_One"></a><a href="somethingNew/Act2?var=X"></a><a href="somethingNew/Act3?var=Y;var2=Z"></a>

 

Naturally, the first url is not replaced (it would be a target for the second regex).

 

Link to comment
https://forums.phpfreaks.com/topic/259556-preg_replace/#findComment-1330592
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.