asmith Posted August 18, 2012 Share Posted August 18, 2012 Hello, I'm rewriting my boards names. The original is like SMF: (first number is board id, second is board page) example.com/index.php?board=1.0 to example.com/name-of-board/ if page number is 0 (board=1.0), then page won't be in url: example.com/name-of-board/ if page number is not (board=1.5): example.com/name-of-board-5/ I have an array, with board IDs as keys and values as board names: $new_names = array( 1 => 'new-discussion', 2 => 'feedback' ); Here's the first batch I tried. It is very slow, since I have more than 200 boards: <?php $url_input = array(); $url_output = array(); foreach ($new_names as $id_board => $name) { // When url has other variables. The output put ? at the end to not break other GET variables. $url_input[] = "'example.com/index.php?board=" . $id_board . "\.([1-9][0-9]*)[;&]'"; $url_output[] = 'example.com/' . $name . '-$1/?'; // When url has nothing at the end. $url_input[] = "'example.com/index.php?board=" . $id_board . "\.([1-9][0-9]*)'"; $url_output[] = 'example.com/' . $name . '-$1/'; // Should I go another round for when page number is 0? too nasty! $url_input[] = "'example.com/index.php?board=" . $id_board . "\.0[;&]'"; $url_output[] = 'example.com/' . $name . '/?'; // omg $url_input[] = "'example.com/index.php?board=" . $id_board . "\.0'"; $url_output[] = 'example.com/' . $name . '/'; } $content = preg_replace($url_input, $url_output, $content); ?> That's 4 rule per board (x 200). 800 rules, it is taking 0.2s to 0.5s! Any advise is greatly appreciated. Quote Link to comment Share on other sites More sharing options...
trq Posted August 18, 2012 Share Posted August 18, 2012 Is there a reason your doing this in php and not mod_rewrite? It's pretty difficult to see how this is actually being used. Quote Link to comment Share on other sites More sharing options...
asmith Posted August 18, 2012 Author Share Posted August 18, 2012 isn't mod_rewrite an Apache/webserver module for resolving friendly urls to their actual address? I'm trying to do the other way around. Make my boards friendly then use nginx and use its mod_rewrite to resolve these back. Quote Link to comment Share on other sites More sharing options...
DavidAM Posted August 18, 2012 Share Posted August 18, 2012 Use the force, Luke. Let the system do the work: // The list of boards keyed by the Board's ID $boards = array(1 => 'Board-A', 2 => 'Board-B', 3 => 'Board-J', 42 => 'Grok All'); // The original URL we are trying to convert $content = 'example.com/index.php?order=date&board=42.4&search=mike'; $urlParts = parse_url($content); // Breakup the URL $qsParts = array(); // Breakup the Query String (if any) if (isset($urlParts['query'])) parse_str($urlParts['query'], $qsParts); // Convert the BoardID.Page to BoardName-Page if (isset($qsParts['board'])) { $boardParts = explode('.', $qsParts['board']); $boardName = $boards[$boardParts[0]]; if ( (count($boardParts) > 1) and (!empty($boardParts[1])) ) $boardName .= '-' . $boardParts[1]; unset($qsParts['board']); } else { // No Board ID -- use the default $boardName = $boards[1]; } // Build the new URL $newUrl = 'example.com/' . $boardName; if (! empty($qsParts)) $newUrl .= '?' . http_build_query($qsParts); print($content . ' => ' . $newUrl . PHP_EOL); No repeating code! It handles all boards and all extra parameters in one fell swoop. If any of it is not clear, feel free to ask. Quote Link to comment Share on other sites More sharing options...
asmith Posted August 18, 2012 Author Share Posted August 18, 2012 Thanks for the reply. It is all clear. The $content is my example is the whole html output. My code scans the whole output and replace urls with new ones. Your code is dealing only with one url. So I assume you mean, I get all the urls in the output myself in an array, then run each of them by this function. Quote Link to comment Share on other sites More sharing options...
Christian F. Posted August 18, 2012 Share Posted August 18, 2012 asmith: mod_rewrite is used to do all kinds of address manipulation and redirection from the server-side, including sending HTTP errors. It's not limited just to making SEO-friendly (fake) URLs. Quote Link to comment Share on other sites More sharing options...
DavidAM Posted August 18, 2012 Share Posted August 18, 2012 The $content is my example is the whole html output. My code scans the whole output and replace urls with new ones. Your code is dealing only with one url. So I assume you mean, I get all the urls in the output myself in an array, then run each of them by this function. Oops, I kind of forgot about that preg_replace. Here's another stab at it. I've turned the conversion into a function. See if you can follow this: // The original Content we are trying to convert $content = '<BODY> <UL> <LI><A href="example.com/index.php?board=1.0">First Board</A></LI> <LI><A href="example.com/index.php?order=date&board=42.4&search=mike">Another</A></LI> </UL> <P>Try <A href="google.com">Google</A> (not replaced)</P> </BODY>'; $findReplace = array(); // Collect old and new URLs $matches = array(); // Array of what we find with preg_match_all if (preg_match_all('~href=([\'"])(example\.com/[^\1]+?)\1~i', $content, $matches, PREG_PATTERN_ORDER)) { # print_r($matches[2]); // Testing to see what we found foreach ($matches[2] as $oldUrl) { // [2] is an array of the URLs (inside the quotes) found if (!isset($findReplace[$oldUrl])) { // If we have not seen this one already $findReplace[$oldUrl] = changeUrl($oldUrl); // Convert it to the new style } } // Now replace what we found with the new style // str_replace is faster and we don't have any regexp's in there now, anyway $content = str_replace(array_keys($findReplace), $findReplace, $content); print $content; } exit; function changeUrl($oldUrl) { // The list of boards keyed by the Board's ID static $boards = array(1 => 'Board-A', 2 => 'Board-B', 3 => 'Board-J', 42 => 'Grok All'); $urlParts = parse_url($oldUrl); // Breakup the URL $qsParts = array(); // Breakup the Query String (if any) if (isset($urlParts['query'])) parse_str($urlParts['query'], $qsParts); // Convert the BoardID.Page to BoardName-Page if (isset($qsParts['board'])) { $boardParts = explode('.', $qsParts['board']); $boardName = $boards[$boardParts[0]]; if ( (count($boardParts) > 1) and (!empty($boardParts[1])) ) $boardName .= '-' . $boardParts[1]; unset($qsParts['board']); } else { // No Board ID -- use the default $boardName = $boards[1]; } // Build the new URL $newUrl = 'example.com/' . $boardName; if (! empty($qsParts)) $newUrl .= '?' . http_build_query($qsParts); return $newUrl; } By the way, in your original post, you were building up regular expressions for preg_replace. When you do that, you need to remember to escape the regexp special characters. For instance, here is your first line and the correction below it: $url_input[] = "'example.com/index.php?board=" . $id_board . "\.([1-9][0-9]*)[;&]'"; $url_input[] = "'example\.com/index\.php\?board=" . $id_board . "\.([1-9][0-9]*)[;&]'"; Looks like you were using the single-quote as the delimiter. I usually use tilde ("~") because it is highly unlikely that I need to include it in the regexp. The regexp explained: ~href=([\'"])(example\.com/[^\1]+?)\1~i ~ # A delimiter to indicate the beginning of the pattern href= # A literal string "href=" to find since it usually introduces a URL in a link ( # Start a capture group - # 1 [\'"] # A Character class - find either a single-quote or double-quote - the single-quote is escaped because I used single-quotes for the string itself ) # End of capture group - # 1 ( # Start a capture group - # 2 example\.com/ # A literal string "example.com/" - we had to escape the full-stop (".") because it is special to regexp [^\1] # A character class - ^ means NOT when used in the first position - match any character that is NOT the character found in Capture Group # 1 +? # Repeat the preceeding match one or more times but don't be greedy about it ) # End of capture group - # 2 \1 # Match the character found in Capture Group # 1 ~ # The Delimiter marking the end of the pattern i # A modifier to make the matches case-INsensitive Quote Link to comment Share on other sites More sharing options...
asmith Posted August 19, 2012 Author Share Posted August 19, 2012 Thanks DavidAM. Very detailed. I'll try this and see how much time it will take. Thanks for going into trouble for me Quote Link to comment Share on other sites More sharing options...
asmith Posted August 19, 2012 Author Share Posted August 19, 2012 Ok here's an update. str_replace() was kinda messing the string, because if 2 urls had same start and if str_replace() were doing the shorter one first, the second was getting half-way done and it produced broken urls. So, I sorted urls by length first: function bylength($a, $b) { $length1 = strlen($a); $length2 = strlen($b); if ($length1 == $length2) return 0; return $length2 > $length1 ? 1 : -1; } uasort($matches[0], 'bylength'); Everything's so far so good and fast enough Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.