sangfroid Posted November 5, 2008 Share Posted November 5, 2008 Hi I want to replace every 3rd and higher occurence of "." with / in a string. For eg: if i have some string like www.google.com.news , www.google.com.sports, then I would like to have something like www.google.com/news and www.google.com/sports How do i do it with regular expression ?? Quote Link to comment Share on other sites More sharing options...
ddrudik Posted November 5, 2008 Share Posted November 5, 2008 This would seem to require some code, what platform are you using? (C#.NET,PHP, etc.) Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted November 5, 2008 Share Posted November 5, 2008 Here is one possible solution (non-regex): $str = 'www.google.com.news'; $arr = explode('.', $str); $total = count($arr); if($total > 3){ $newArr = $arr[0] . '.' . $arr[1] . '.' . $arr[2]; for($i = 3; $i < $total; $i++){ $newArr .= '/' . $arr[$i]; } echo $newArr; } Output: www.google.com/news Quote Link to comment Share on other sites More sharing options...
ddrudik Posted November 5, 2008 Share Posted November 5, 2008 This would seem to require some code, what platform are you using? (C#.NET,PHP, etc.) I am usually answering questions in a non-platform-specific regex forum, looking back I guess I knew you were using PHP. Sorry about the question. nrg_alpha had the answer. Quote Link to comment Share on other sites More sharing options...
ddrudik Posted November 5, 2008 Share Posted November 5, 2008 There seems to be a (very slight) speed advantage to using this method instead: $str='www.google.com.sports'; if(preg_match('~^((?:.*?\.){2})(.*)~',$str,$parts)){ $str=$parts[1].str_replace('.','/',$parts[2]); } echo $str; Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted November 5, 2008 Share Posted November 5, 2008 There seems to be a (very slight) speed advantage to using this method instead: $str='www.google.com.sports'; if(preg_match('~^((?:.*?\.){2})(.*)~',$str,$parts)){ $str=$parts[1].str_replace('.','/',$parts[2]); } echo $str; Very nice solution, ddrudik! I think a slight modification can sqeeze even more (very slight) speed out of it: '~^((?:[^.]+\.){2}[^.]+)(.*)~' Negated character classes are faster than lazy quantifiers. But for medial tasks, the speed difference in this case would be probably negligible at best. But again, nice solution Cheers, NRG EDIT: I think my solution may be grabbing remaining letters past the last needed dot..so it may be grabbing more than it actually needs..so I suppose to mirror your solution exactly, it could also be written as: '~^((?:[^.]+\.){2})(.*)~' Quote Link to comment Share on other sites More sharing options...
ddrudik Posted November 5, 2008 Share Posted November 5, 2008 My benchmark testing might be flawed, but that pattern runs slower for me: <?php $time_start = microtime(true); $str='www.google.com.sports'; if(preg_match('~^((?:[^.]+\.){2})(.*)~',$str,$parts)){ $str=$parts[1].str_replace('.','/',$parts[2]); } echo $str; echo "<br>".(microtime(true)-$time_start)."<hr>"; $time_start = microtime(true); $str = 'www.google.com.news'; $arr = explode('.', $str); $total = count($arr); if($total > 3){ $newArr = $arr[0] . '.' . $arr[1] . '.' . $arr[2]; for($i = 3; $i < $total; $i++){ $newArr .= '/' . $arr[$i]; } echo $newArr; } echo "<br>".(microtime(true)-$time_start)."<hr>"; $time_start = microtime(true); $str='www.google.com.sports'; if(preg_match('~^((?:.*?\.){2})(.*)~',$str,$parts)){ $str=$parts[1].str_replace('.','/',$parts[2]); } echo $str; echo "<br>".(microtime(true)-$time_start)."<hr>"; ?> Output: www.google.com/sports 3.9815902709961E-5 -------------------------------------------------------------------------------- www.google.com/news 1.7881393432617E-5 -------------------------------------------------------------------------------- www.google.com/sports 1.2874603271484E-5 Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted November 5, 2008 Share Posted November 5, 2008 We have met in this thread, haven't we? Nice running into you again BTW. In that link, we both agreed (through your code snippet) that the negated character class beat out the lazy quantifier (no backtracking involved).. I have used the following snippet (which is kind of based on your code in the link above): $time_start = microtime(true); $str='www.google.com.sports'; if(preg_match('~^((?:[^.]+\.){2})(.*)~',$str,$parts)){ // NRG $str=$parts[1].str_replace('.','/',$parts[2]); } echo $str; $elapsed_time = round($time_end-$time_start,4); echo $elapsed_time . '<br />'; $time_start = microtime(true); $str='www.google.com.sports'; if(preg_match('~^((?:.*?\.){2})(.*)~',$str,$parts)){ // ddrudik $str=$parts[1].str_replace('.','/',$parts[2]); } echo $str; $elapsed_time = round($time_end-$time_start,4); echo $elapsed_time . '<br />'; Example output: www.google.com/sports-1225926253.6569 www.google.com/sports-1225926253.657 Granted, the difference on a single pass is so small... One thing I did notice is the use of ($elapsed_time = round($time_end-$time_start,4).. oh.. and the use of the round function... Perhaps this is throwing things off? Mabey the readings I am getting is skewed because of this? Perhaps my example is incorrect? Cheers, NRG Quote Link to comment Share on other sites More sharing options...
ddrudik Posted November 5, 2008 Share Posted November 5, 2008 In that code snippet I don't see $time_end defined, so your values would have no reference ending point, as well we can't round to 4 places since our values are 1 place smaller than that. Consider this code: <?php $time_start = microtime(true); $str='www.google.com.sports'; if(preg_match('~^((?:[^.]+\.){2})(.*)~',$str,$parts)){ // NRG $str=$parts[1].str_replace('.','/',$parts[2]); } echo $str; $time_end = microtime(true); $elapsed_time = $time_end-$time_start; echo $elapsed_time . '<br />'; $time_start = microtime(true); $str='www.google.com.sports'; if(preg_match('~^((?:.*?\.){2})(.*)~',$str,$parts)){ // ddrudik $str=$parts[1].str_replace('.','/',$parts[2]); } echo $str; $time_end = microtime(true); $elapsed_time = $time_end-$time_start; echo $elapsed_time . '<br />'; ?> This result: www.google.com/sports4.1007995605469E-5 www.google.com/sports1.3828277587891E-5 The speed of the previous thread's solution must have been influenced by different factors, maybe it's the use of capture groups in this example, not sure. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted November 5, 2008 Share Posted November 5, 2008 Ah, good call on the lack of $time_end. That would indeed be a problem Don't I feel foolish now... so it's settled (yeah, not sure why the discrepancy either. It nags at me...) Quote Link to comment Share on other sites More sharing options...
ddrudik Posted November 5, 2008 Share Posted November 5, 2008 I will just assume that I have to test all alternatives to get the actual speed results for a given match pattern and source string. Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted November 5, 2008 Share Posted November 5, 2008 I did test mine.. just not correctly... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.