Jump to content

Fastest way to achieve this (mine is slow)


asmith

Recommended Posts

Hello,

 

I'm rewriting my boards names. The original is like SMF: (first number is board id, second is board page)

example.com/index.php?board=1.0

to

example.com/name-of-board/

 

if page number is 0 (board=1.0), then page won't be in url:

example.com/name-of-board/

if page number is not (board=1.5):

example.com/name-of-board-5/

 

I have an array, with board IDs as keys and values as board names:

 

$new_names = array(
    1 => 'new-discussion',
    2 => 'feedback'
);

 

Here's the first batch I tried. It is very slow, since I have more than 200 boards:

 

<?php

$url_input = array();
$url_output = array();

foreach ($new_names as $id_board => $name)
{
    // When url has other variables. The output put ? at the end to not break other GET variables.
    $url_input[] = "'example.com/index.php?board=" . $id_board . "\.([1-9][0-9]*)[;&]'";
    $url_output[] = 'example.com/' . $name . '-$1/?';

    // When url has nothing at the end.
    $url_input[] = "'example.com/index.php?board=" . $id_board . "\.([1-9][0-9]*)'";
    $url_output[] = 'example.com/' . $name . '-$1/';

    // Should I go another round for when page number is 0? too nasty!
    $url_input[] = "'example.com/index.php?board=" . $id_board . "\.0[;&]'";
    $url_output[] = 'example.com/' . $name . '/?';

    // omg 
    $url_input[] = "'example.com/index.php?board=" . $id_board . "\.0'";
    $url_output[] = 'example.com/' . $name . '/';   
}

$content = preg_replace($url_input, $url_output, $content);
?>

 

That's 4 rule per board (x 200). 800 rules, it is taking 0.2s to 0.5s! :o

Any advise is greatly appreciated.

Link to comment
Share on other sites

Use the force, Luke.

 

Let the system do the work:

 

// The list of boards keyed by the Board's ID
$boards = array(1 => 'Board-A', 2 => 'Board-B', 3 => 'Board-J', 42 => 'Grok All');

// The original URL we are trying to convert
$content = 'example.com/index.php?order=date&board=42.4&search=mike';

$urlParts = parse_url($content);	// Breakup the URL

$qsParts = array();		// Breakup the Query String (if any)
if (isset($urlParts['query'])) parse_str($urlParts['query'], $qsParts);

// Convert the BoardID.Page to BoardName-Page
if (isset($qsParts['board'])) {
$boardParts = explode('.', $qsParts['board']);
$boardName = $boards[$boardParts[0]];
if ( (count($boardParts) > 1) and (!empty($boardParts[1])) ) $boardName .= '-' . $boardParts[1];
unset($qsParts['board']);
} else {
// No Board ID -- use the default
$boardName = $boards[1];
}

// Build the new URL
$newUrl = 'example.com/' . $boardName;
if (! empty($qsParts)) $newUrl .= '?' . http_build_query($qsParts);

print($content . ' => ' . $newUrl . PHP_EOL);

 

No repeating code! It handles all boards and all extra parameters in one fell swoop.

 

If any of it is not clear, feel free to ask.

 

Link to comment
Share on other sites

Thanks for the reply.

 

It is all clear.

The $content is my example is the whole html output. My code scans the whole output and replace urls with new ones. Your code is dealing only with one url. So I assume you mean, I get all the urls in the output myself in an array, then run each of them by this function.

Link to comment
Share on other sites

The $content is my example is the whole html output. My code scans the whole output and replace urls with new ones. Your code is dealing only with one url. So I assume you mean, I get all the urls in the output myself in an array, then run each of them by this function.

 

Oops, I kind of forgot about that preg_replace. Here's another stab at it. I've turned the conversion into a function. See if you can follow this:

 

// The original Content we are trying to convert
$content = '<BODY>
<UL>
<LI><A href="example.com/index.php?board=1.0">First Board</A></LI>
<LI><A href="example.com/index.php?order=date&board=42.4&search=mike">Another</A></LI>
</UL>
<P>Try <A href="google.com">Google</A> (not replaced)</P>
</BODY>';

$findReplace = array();	// Collect old and new URLs

$matches = array();	// Array of what we find with preg_match_all
if (preg_match_all('~href=([\'"])(example\.com/[^\1]+?)\1~i', $content, $matches, PREG_PATTERN_ORDER)) {
# print_r($matches[2]);		// Testing to see what we found

foreach ($matches[2] as $oldUrl) {	// [2] is an array of the URLs (inside the quotes) found
	if (!isset($findReplace[$oldUrl])) {	// If we have not seen this one already
		$findReplace[$oldUrl] = changeUrl($oldUrl);	// Convert it to the new style
	}
}

// Now replace what we found with the new style
// str_replace is faster and we don't have any regexp's in there now, anyway
$content = str_replace(array_keys($findReplace), $findReplace, $content);
print $content;
}

exit;

function changeUrl($oldUrl) {
// The list of boards keyed by the Board's ID
static $boards = array(1 => 'Board-A', 2 => 'Board-B', 3 => 'Board-J', 42 => 'Grok All');

$urlParts = parse_url($oldUrl);	// Breakup the URL

$qsParts = array();		// Breakup the Query String (if any)
if (isset($urlParts['query'])) parse_str($urlParts['query'], $qsParts);

// Convert the BoardID.Page to BoardName-Page
if (isset($qsParts['board'])) {
	$boardParts = explode('.', $qsParts['board']);
	$boardName = $boards[$boardParts[0]];
	if ( (count($boardParts) > 1) and (!empty($boardParts[1])) ) $boardName .= '-' . $boardParts[1];
	unset($qsParts['board']);
} else {
	// No Board ID -- use the default
	$boardName = $boards[1];
}

// Build the new URL
$newUrl = 'example.com/' . $boardName;
if (! empty($qsParts)) $newUrl .= '?' . http_build_query($qsParts);

return $newUrl;
}

 

By the way, in your original post, you were building up regular expressions for preg_replace. When you do that, you need to remember to escape the regexp special characters. For instance, here is your first line and the correction below it:

 

$url_input[] = "'example.com/index.php?board=" . $id_board . "\.([1-9][0-9]*)[;&]'";
$url_input[] = "'example\.com/index\.php\?board=" . $id_board . "\.([1-9][0-9]*)[;&]'";

 

Looks like you were using the single-quote as the delimiter. I usually use tilde ("~") because it is highly unlikely that I need to include it in the regexp.

 

 

The regexp explained: ~href=([\'"])(example\.com/[^\1]+?)\1~i

 

~		# A delimiter to indicate the beginning of the pattern
href=		# A literal string "href=" to find since it usually introduces a URL in a link
(		# Start a capture group - # 1
[\'"]		# A Character class - find either a single-quote or double-quote - the single-quote is escaped because I used single-quotes for the string itself
)		# End of capture group - # 1
(		# Start a capture group - # 2
example\.com/	# A literal string "example.com/" - we had to escape the full-stop (".") because it is special to regexp
[^\1]		# A character class - ^ means NOT when used in the first position - match any character that is NOT the character found in Capture Group # 1
+?		# Repeat the preceeding match one or more times but don't be greedy about it
)		# End of capture group - # 2
\1		# Match the character found in Capture Group # 1
~		# The Delimiter marking the end of the pattern
i		# A modifier to make the matches case-INsensitive

Link to comment
Share on other sites

Ok here's an update.

 

str_replace() was kinda messing the string, because if 2 urls had same start and if str_replace() were doing the shorter one first, the second was getting half-way done and it produced broken urls. So, I sorted urls by length first:

 

function bylength($a, $b)
{
$length1 = strlen($a);
$length2 = strlen($b);
if ($length1 == $length2)
	return 0;
return $length2 > $length1 ? 1 : -1;
}

uasort($matches[0], 'bylength');

 

Everything's so far so good and fast enough :)

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.