Jump to content

Match a duplicate folder in a path (back reference not working?)


Recommended Posts

I can not figure out why this isn't working, but I know it must be something incredibly simple.

 

I accidentally ruined some paths in a database and want to fix them.

 

http://example.com/nate/joe/joe/tom/

 

Should be:

 

http://example.com/nate/joe/tom/

 

I want to match and then erase one of the duplicate folders (they may be trailing or in the middle of the path).

 

Here is what I've been trying, but it simply doesn't work:

@(/[^/]+)\1@

(using @ as the delimiter)

 

I've also tried:

@/([^/]+)/\1@

(using @ as the delimiter)

 

Thoughts?

 

Thanks!

Nate

 

 

 

 

 

I should note my code:

 

preg_match_all("@/([^/]+)/\1@", "http://www.example.com/nate/joe/joe/tom/", $m);
print_r($m)
Array (
 [0] => Array (
 )
 [1] => Array (
 )
)

 

Oddly, removing the back reference treats the dual "joe" folders as non-unique and only lists one. I am clearly missing something here!

 

preg_match_all("@/([^/]+)/@", "http://www.example.com/nate/joe/joe/tom/", $m);
print_r($m)
Array (
 [0] => Array (
    [0] => /www.example.com/
    [1] => /joe/
    [2] => /tom/
 )
 [1] => Array (
    [0] => www.example.com
    [1] => joe
    [2] => tom

 )
)

.

 

I've also tried preg_match, think there was some difference I am unaware of. It delivers similar results.

 

 

  Quote

Works for me.

 

Yeah, me too. Something with the double quotes. Once I changed them to single quotes (as you had) it all started working. Don't get why, but I know enough to just not ask questions sometimes.

 

Thanks DW!

  Quote

Why are you using preg_match() or preg_match_all()?  You should be using preg_replace().

 

I'm not 100% sure that all of the duplicate folders like that are actually wrong (these aren't my files or folder structure), so I wanted to just print out a list I could quick scan before making the change permanent. I've already done enough damage :)

 

nz

  Quote

  Quote

Works for me.

 

Yeah, me too. Something with the double quotes. Once I changed them to single quotes (as you had) it all started working. Don't get why, but I know enough to just not ask questions sometimes.

 

Thanks DW!

 

Always use single quotes for regexes. =P  I didn't even notice you used double quotes.

  Quote

$string = "http://example.com/nate/joe/joe/tom/";
$a = explode("//",$string) ;
$b = explode("/",$a[1]);
$removed = implode("/",array_unique($b)) ;
echo $a[0]."//$removed";

 

This tends to be a lot of work for a simple task that is best done using Darkwater's solution.

Keeping the code small with the least amount of steps is most recommended.

  Quote

  Quote

$string = "http://example.com/nate/joe/joe/tom/";
$a = explode("//",$string) ;
$b = explode("/",$a[1]);
$removed = implode("/",array_unique($b)) ;
echo $a[0]."//$removed";

 

This tends to be a lot of work for a simple task that is best done using Darkwater's solution.

Keeping the code small with the least amount of steps is most recommended.

i don't agree with you. yes it looks like its a lot of work, but these string/array methods are so common that one looking at the code knows what's going on what its doing. its true regex is short and sharp but when it comes to troubleshooting, especially for those who is going to read your code with lots  of regex, he's going to have a hard time. anyway, the simplest solution might be using parse_url, instead of wasting time constructing regex.

  Quote

i don't agree with you. yes it looks like its a lot of work, but these string/array methods are so common that one looking at the code knows what's going on what its doing.

 

Code readability has no bearing on code performance. Clean and efficiently written code (whether some can read / understand it or not) will work better than code that may be more readable yet employ multiple extra steps. Good code should never be 'dumbed down' for others who might have problems trouble shooting. The point of writing good clean, fast and efficient code it to achieve a solution to a problem in the most optimal fashion. If anything, good code should encourage those who are not comfortable with it to further educate themselves so that they may embrace the level of better coding practices. There are many ways to solve a problem when dealing with programming. It boils down to most robust and optimal solutions that should matter most.

 

  Quote

its true regex is short and sharp but when it comes to troubleshooting, especially for those who is going to read your code with lots of regex, he's going to have a hard time. anyway, the simplest solution might be using parse_url, instead of wasting time constructing regex.

 

Regex may not be the most readable.. but again.. it boils down to speed and efficiency. It is an invaluable tool in any serious PHP programmer's arsenal. Well written patterns prove to be quite efficient and effective.

 

And finally, parse_url would not in itself solve the OP's problem. That function merely breaks down a URL into multiple components.

consider:

 

$url = 'http://example.com/nate/joe/joe/tom/';
$urlParsed = parse_url($url);
echo '<pre>';
print_r($urlParsed);
echo '</pre>';

 

Ouput:

Array
(
    [scheme] => http
    [host] => example.com
    [path] => /nate/joe/joe/tom/
)

 

This in no way solves the issue at hand. You would need additional code, all the while Darkwater's pattern solves this quite nicely using one simple line.

  Quote

Code readability has no bearing on code performance. Clean and efficiently written code (whether some can read / understand it or not) will work better than code that may be more readable yet employ multiple extra steps.

work better? in what ways? anyway, i am not arguing with you on code performance. I am just merely showing OP there are more simpler ways to do things than spending time constructing complex regular expressions, which he seems to have a headache on, since needs to post for help.

 

  Quote

Good code should never be 'dumbed down' for others who might have problems trouble shooting. The point of writing good clean, fast and efficient code it to achieve a solution to a problem in the most optimal fashion. If anything, good code should encourage those who are not comfortable with it to further educate themselves so that they may embrace the level of better coding practices. There are many ways to solve a problem when dealing with programming. It boils down to most robust and optimal solutions that should matter most.

Good code should be immediately understandable to someone who might not be the original coder. You have said it, there are many ways to do things and one of them is using less regexp.

 

  Quote

Regex may not be the most readable.. but again.. it boils down to speed and efficiency.

speed & efficiency is very subjective,  it depends on how well the coder understands his concepts. A not well written regexp can be slow too.

 

 

  Quote

And finally, parse_url would not in itself solve the OP's problem. That function merely breaks down a URL into multiple components.

consider:

its only another example of getting the last part. further processing then can be done, like split and array_unique i suggested. so do you think OP's problem can't be solved without regexp?

 

  Quote

This in no way solves the issue at hand. You would need additional code, all the while Darkwater's pattern solves this quite nicely using one simple line.

so what if there's additional code? As long as its understandable, its fine.

  Quote

Code readability has no bearing on code performance. Clean and efficiently written code (whether some can read / understand it or not) will work better than code that may be more readable yet employ multiple extra steps.

  Quote

work better? in what ways?

The less processing needed to perform a task, the less prone to error, the better. In that way  ;)

 

  Quote

anyway, i am not arguing with you on code performance. I am just merely showing OP there are more simpler ways to do things than spending time constructing complex regular expressions, which he seems to have a headache on, since needs to post for help.

 

Well, not everyone is equally well versed in regex. That's what these forums are for. To offer help to people. For some, yes regex is a headache. To others, not so much.

The difficulty is relative to the experience of the bolder in question. This doesn't mean regex is a poor solution by any stretch.

 

  Quote
Good code should be immediately understandable to someone who might not be the original coder.

And if the non-original coder is a novice looking at code written by someone with a solid decade of high quality experience? Sorry, but your statement is incorrect. Good code should perform its task in a fast, efficient and bug-free manner. If someone else cannot read robust quality code, this is not the fault of the advanced programmer.

 

By example, I has seen someone present a problem that took say 15 lines of code to solve. I would come along, and offer a better solution at say 7 lines.. Then one of the 'big boys' would come and solve the problem in 3. I would look at their code and not understand what they have done. Does this make their code bad? Nope.. turns out, my knowledge / standards were not high enough. I had to spend some time reverse engineering their solution to understand.. I walk away much more knowledgeable than before. And if I can use their methodologies to solve a future problem (whether others can understand it or not), I would.

 

  Quote
You have said it, there are many ways to do things and one of them is using less regexp.

I never said using less regex.. I said there are many solutions to a problem. My point is that not all solutions are equal (as far as efficiency is concerned). That was my point. Some solutions are better (read, less code bloat, faster executions, less prone to error, etc..) than others.

 

  Quote

Regex may not be the most readable.. but again.. it boils down to speed and efficiency.

  Quote

speed & efficiency is very subjective,  it depends on how well the coder understands his concepts. A not well written regexp can be slow too.

Speed and efficiency is not subjective.. it boils down to CPU cycles and such. Perhaps you are thinking in terms of time and effort to write code? If so, then yes, I agree.. but when I mention speed and efficiency, I am talking from the standpoint of program execution. Sorry if there was any confusion there. And yes, you are right... poorly written regex can be slower.. in this case, Darkwater's case, it is not poorly written.

 

 

  Quote

And finally, parse_url would not in itself solve the OP's problem. That function merely breaks down a URL into multiple components.

consider:

  Quote

its only another example of getting the last part. further processing then can be done, like split and array_unique i suggested. so do you think OP's problem can't be solved without regexp?

That's my point.. it involves extra program execution. I am not suggesting that the OP's problem cannot be solved without regex. It obviously can be solved without regex to be sure. But problems like the OP's is in this case perfectly suitable in regex.. because the issue needs to find a repeating pattern and replace it.

 

  Quote

This in no way solves the issue at hand. You would need additional code, all the while Darkwater's pattern solves this quite nicely using one simple line.

  Quote

so what if there's additional code? As long as its understandable, its fine.

This is where we clearly differ. You equate 'understandability' as being fine. I equate elegantly fast and efficient bug-free code as being fine.  ;)

So in the end, we can agree to disagree.

  Quote

The less processing needed to perform a task, the less prone to error, the better. In that way  ;)

have you measured how many behind-the-scene "processes" the regexp engine creates when parsing regexp? :) if you were to go behind the scene, what its doing is similar.

 

  Quote

Good code should perform its task in a fast, efficient and bug-free manner. If someone else cannot read robust quality code, this is not the fault of the advanced programmer.

that's part of it. Good code, should also be maintainable and readable. I am sure you don't code your program in binary or numbers do you? Same logic, with too much symbols , it makes code unreadable.

 

  Quote

By example, I has seen someone present a problem that took say 15 lines of code to solve. I would come along, and offer a better solution at say 7 lines.. Then one of the 'big boys' would come and solve the problem in 3. I would look at their code and not understand what they have done. Does this make their code bad? Nope.. turns out, my knowledge / standards were not high enough. I had to spend some time reverse engineering their solution to understand.. I walk away much more knowledgeable than before. And if I can use their methodologies to solve a future problem (whether others can understand it or not), I would.

So its even better if i come and present a one line solution that crams everything together?? A 15 lines of code program, properly indented, with understandable english words, and works just as fast as regexp sure beats one that's uncomprehensible and less lines of code.

 

  Quote

Speed and efficiency is not subjective.. it boils down to CPU cycles and such. Perhaps you are thinking in terms of time and effort to write code? If so, then yes, I agree.. but when I mention speed and efficiency, I am talking from the standpoint of program execution. Sorry if there was any confusion there. And yes, you are right... poorly written regex can be slower.. in this case, Darkwater's case, it is not poorly written.

yes indeed what i meant. however, have you conducted experiments to see whether regexp are indeed better in terms of speed than native string functions to solve problems?

 

  Quote

That's my point.. it involves extra program execution. I am not suggesting that the OP's problem cannot be solved without regex. It obviously can be solved without regex to be sure. But problems like the OP's is in this case perfectly suitable in regex.. because the issue needs to find a repeating pattern and replace it.

  Quote

I didn't say its not suitable. Remember, i provided a non regex solution to OP, but you said its a lot of work. I say its not, because behind the regexp engine is also doing work.

 

This is where we clearly differ. You equate 'understandability' as being fine. I equate elegantly fast and efficient bug-free code as being fine.  ;)

So in the end, we can agree to disagree.

yes indeed. Its all up to the OP to consider and use the solutions. Its really no point continuing to argue here.

  Quote
have you measured how many behind-the-scene "processes" the regexp engine creates when parsing regexp? :) if you were to go behind the scene, what its doing is similar.

I personally have not. But Jeffrey Friedl's book on mastering regular expressions give a good look at what regex engines are doing behind the scenes. If you have not picked up and read his book, I suggest you do (if you are interested at all in regex). Obviously, tasks such as simple find a straightforward set of letters and replace it, it would be wiser to use str_replace instead of preg_replace.. but in cases like the problem posed by the OP in this thread, PCRE is perfectly useful for this.

 

  Quote

that's part of it. Good code, should also be maintainable and readable. I am sure you don't code your program in binary or numbers do you? Same logic, with too much symbols , it makes code unreadable.

 

I agree. The question here is who's standards dictates maintainability and readability? By example, are you suggesting that Darkwater's solution is complex and not readable and maintainable? For those not familiar with regex, yes, perhaps so. But for those who have a better understanding will not be lost in the slightest. I'll say it again.. good code should not be 'dumbed down' for those less knowledgeable. If the code is well written (even at advanced enough levels), this does not make it unreadble / unmaintainable to everyone. I agree in that it may not be so to everyone. But who do we cater to? The absolute beginners? The absolute coding Savants? I say code to the best of your abilities.

 

  Quote

So its even better if i come and present a one line solution that crams everything together?? A 15 lines of code program, properly indented, with understandable english words, and works just as fast as regexp sure beats one that's uncomprehensible and less lines of code.

 

I believe that less lines will translate to better performance in the end, yes. The more hoops the parser has to go through.. the worse it gets...

It does of course highly depend on the task at hand. In this particular case, the speed difference between your solution and Darkwater's is negligible at best (because of how small the problem is). On a much larger scale, the differences would be likely much more pronounced. For me, it's a matter of coding principal. If I can get the code nice and neat and have it work with less lines, the better. It CAN still be presentable and readable and maintainable (as in Darkwater's case). So yes, I think that less is better.

 

  Quote

yes indeed what i meant. however, have you conducted experiments to see whether regexp are indeed better in terms of speed than native string functions to solve problems?

 

Again,admittedly no I have not. Have you?

 

  Quote

I didn't say its not suitable. Remember, i provided a non regex solution to OP, but you said its a lot of work. I say its not, because behind the regexp engine is also doing work.

 

Well, all routines must do work, yes. So of course, even regex needs to 'work' for the solution. In reality, for this instance, the system (either way.. yours or Darkwaters) doesn't work hard (in the grand scheme of things.. so perhaps I mislabeled .works hard').. Just from a programmer's standpoint, if I had to choose between multiple explodes, passing an array through array_unique, then imploding, verses a simply capture with back referencing, well, you can guess which direction I would take.

 

  Quote

This is where we clearly differ. You equate 'understandability' as being fine. I equate elegantly fast and efficient bug-free code as being fine.  ;)

So in the end, we can agree to disagree.

  Quote

yes indeed. Its all up to the OP to consider and use the solutions. Its really no point continuing to argue here.

 

Indeed, the OP can pick / choose. Since the OP is using preg, and since preg is perfectly suitable for this problem, and since this is a regex forum, a wise preg solution was presented.

So I decided to actually test this (you have me curious).. The first part is using your code, and second is using Darkwater's. I time everything encapsulating each solution.

So the top line outputted is yours.. the next line is his.

 

Here is the test code:

 

$start = gettimeofday();
$string = "http://example.com/nate/joe/joe/tom/";
$a = explode("//",$string) ;
$b = explode("/",$a[1]);
$removed = implode("/",array_unique($b)) ;
echo $a[0]."//$removed";
$final = gettimeofday();
$sec = ($final['sec'] + $final['usec']/1000000)-
       ($start['sec'] = $start['usec']/1000000);
printf("\r\nTime of execution: %.3f units\n", $sec);

echo '<br />'; // insert a new line between tests...

$start = gettimeofday();
$string = "http://example.com/nate/joe/joe/tom/";
$new_url = preg_replace('!/([^/]+)/\1!', '/$1', $string);
echo $new_url;
$final = gettimeofday();
$sec = ($final['sec'] + $final['usec']/1000000)-
       ($start['sec'] = $start['usec']/1000000);
printf("\r\nTime of execution: %.3f units\n", $sec);

 

Sample Output:

http://example.com/nate/joe/tom/ Time of execution: 1223270662.000 units 
http://example.com/nate/joe/tom/ Time of execution: 1223270662.000 units

 

So I concede, it is neck and neck.

If you keep refreshing, it is pretty much tied it seems.. Sometimes, yours is faster by .001, and sometimes its the other way around. This was tested in a clean file (no other junk in it). Just a Dreamweaver blank HTML template page with the above code in it. So as I suspected.. for tasks like this, the speed is negligible. I still would have went with the preg approch personally though, as this [to me], this makes more sense use it for the OP's problem.

 

Cheers,

 

NRG

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.