Jump to content

preg_replace


asmith

Recommended Posts

Hello,

 

I'm getting entire site content and trying to replace old urls with new ones using this:

 

<?php
$urlin = array(
	"'somethingFile\?act=([a-zA-Z0-9\-]+);(.+)'",
	"'somethingFile\?act=([a-zA-Z0-9\-]+)'",
);

$urlout = array(
	"somethingNew/\\1/?\\2",
	"somethingNew/\\1/",
);

echo preg_replace($urlin, $urlout, $temp);
?>

 

This works almost fine except that if I have 2 links in one line in html content:

 

<a href="somethingFile?act=someAct;var=val"></a><a href="somethingFile?act=someAct;var=val"></a>

 

The first link in the output gets replaced fine, but the second fails (gets matched with the second array value):

<a href="somethingNew/someAct/?var=val"></a><a href="somethingNew/;var=val"></a>

 

But if I split my string into 2 lines, all works fine:

<a href="somethingFile?act=someAct;var=val"></a>
<a href="somethingFile?act=someAct;var=val"></a>

Link to comment
Share on other sites

Hi asmith,

 

the problem is not that the second link gets matched by the second regex.

(If you want to see that, eliminate the second regex: you will get the same output.)

 

The problem is your second greedy plus quantifier. Your second plus matches everything up to the end of the string, so that your Group 2 capture actually is:

var=val"></a><a href="somethingFile?act=someAct;var=val"></a>

 

At that stage, after the first replacement, the whole string has been matched, so there is nothing left for the regex engine to match.

 

This is a classic problem (you will find it explained in detail on this page of mine about various kinds of greedy and lazy regex matching).

 

There are three basic solutions:

- making the second plus quantifier lazy so that it only expands until the first end of string or tag marker is found (adding a question mark to the + sign)

- changing the character class so that it cannot expand beyond the first end quote (using a negative character class, e.g. [^"]

- the easiest: not capturing Group 2 at all, because who cares... At this stage, you are just replacing the semi-colon with a question mark, right? So you can stop.

 

To take care of your two regexes in one single match, I suggest this:

 

Input:

<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y"></a>

 

Code:

<?php
$string='<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y"></a>';
$regex=',somethingFile\?act=([^"]+),';
$output=preg_replace_callback($regex,function($m){return 'somethingNew/'.str_replace(';','?',$m[1]);},$string);
echo htmlentities($output).'<br />';
?>

 

Output:

<a href="somethingNew/Act_One"></a><a href="somethingNew/Act2?var=X"></a><a href="somethingNew/Act3?var=Y"></a>

 

This solution assumes there is only one variable (aside from act) in each url, conforming to your sample, i.e. not "?act=1;v1=x;v2=y". If you need multiple variables, it's a simple modification, just let me know.

 

I may have missed something, so please let me know if I did or if you have any questions.

Wishing you a fun weekend.

:)

 

 

[Edit: added "disclaimer" about the "?act=1;v1=x;v2=y" situation.]

Link to comment
Share on other sites

Just in case someone is interested:

 

1. Multi-Variable Variation (taking care of both regexes, as in the first post)

 

Input:

<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y;var2=Z"></a>

 

Code:

<?php
$string='<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y;var2=Z"></a>';
$regex=',somethingFile\?act=([^;"]+)(?,';
$output=preg_replace_callback($regex,function($m){return 'somethingNew/'.$m[1].(isset($m[2])?'/?':'/');},$string);
echo htmlentities($output).'<br />';
?>

 

Output:

<a href="somethingNew/Act_One/"></a><a href="somethingNew/Act2/?var=X"></a><a href="somethingNew/Act3/?var=Y;var2=Z"></a>

 

2. Basic option without callback (only for the first regex)

In the first post, I didn't give a code example of the "three basic solutions" if you just wanted to fix the first regex (as the solution I proceeded to give rolled your two regexes into one).

But if you were interested, here's one possibility among many (along the lines of option #3 I was mentioning).

 

$string='<a href="somethingFile?act=Act_One"></a><a href="somethingFile?act=Act2;var=X"></a><a href="somethingFile?act=Act3;var=Y;var2=Z"></a>';
$regex=',somethingFile\?act=([^;"]+);,';
$replace='somethingNew/\\1?';
$output=preg_replace($regex,$replace,$string);
echo htmlentities($output).'<br />';

 

Output: <a href="somethingFile?act=Act_One"></a><a href="somethingNew/Act2?var=X"></a><a href="somethingNew/Act3?var=Y;var2=Z"></a>

 

Naturally, the first url is not replaced (it would be a target for the second regex).

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.