Jump to content

[SOLVED] need help simplifying text for modrewrite urls


HaLo2FrEeEk

Recommended Posts

I'm looking for a simple way to simplify a string of text (used as a title) to be used in a mod_rewrite url.  Here's the code I'm using:

 

$url_remove = array("-and-", "-of-", "-a-", "-the-", "-to-", "-for-", ".", ":-");
$urltitle = strtolower($info['news_title']);
$urltitle = str_replace(" ", "-", " ".$urltitle." ");
$urltitle = str_replace($url_remove, "", " ".$urltitle." ");

 

And here's an example title:

 

New Forum: Gaming Cutscenes

 

I need it to replace it with this:

 

new-forum-gaming-cutscenes

 

instead it replaces it to this:

 

-new-forumgame-cutscenes-

 

Obviously something is going amiss somewhere.  How can I go about removing words that search engines don't find relavant (like and, or, for, etc...) and then replace spaces with dashes (-)?

 

Thanks in advance!

This is perhaps not the most elegant solution, but I think it does everything you need...

 

function make_title($input) {
   $input = strtolower($input);
   $remove = array('~[^\w ]~', "~\band\b~", "~\bof\b~", "~\ba\b~", "~\bthe\b~", "~\bto\b~", "~\bfor\b~", '~( ) +~');
   $input = preg_replace($remove, '$1', $input);
   $input = str_replace(' ', '-', $input);
   $input = trim($input, '-');
   return $input;
}

There has to be a better way.  There are so many characters to replace, I don't want to manually type them all.  There's periods, colons, semicolons, commas, exclamation points, question marks, asterisks, quotes, apostrophes, and a ton of others.  Not to mention things like ellipses (...) that I'm finding difficult to replace.  There has to be some function to simplify it.

Ok, I worked a bit on the code.  There are no word replacements yet, but I replaced all punctuation characters using an array of those characters.  Here is my current code:

 

$url_replace = array("\"", """, "'", ",", ":", ";", ". ", "! ", "? ", ".", "!", "?", "*", "/");
$urltitle = strtolower($info['news_title']);
$urltitle = str_replace($url_replace, "", $urltitle." ");
$urltitle = trim(str_replace(" ", "-", $urltitle), "-");

 

This will replace strings like "New Forum: Game Cutscenes" with "new-forum-game-cutscenes", so that's all good, but it has trouble with things like elipses, where it replaces all the periods with nothing, making something like "JTV changes suck...just sayin" look like this "jtv-changes-suckjust-sayin".  As you can see that's a problem.  Can anyone think of a way to fix it?

 

And if not, then I also tried this:

 

$urltitle = preg_replace("#[\W_]#", "", $urltitle);

 

For matching non-alpha-numeric characters and underscores...unfortunately it also matches spaces.  I can't think of a way to make it not match spaces but still match the rest of the characters.

 

Oh and titles like this "Halo 3 ODST / Halo Reach Media" replaces to this "halo-3-odst--halo-reach-media" with the double dash, I need that fixed, too.

ereg_replace("[^A-Za-z0-9]", " ", $whatever);

 

this just cuts it all out. maybe overkill. you can also use a combo of

 

strpos(

 

that will return the position of a defined char then

 

substr(

 

to pull what part of the string in ref to the defined char forward or backward in the string

 

im sure i could help more if i understood your source of data and desired output but im sure these three things will bring you closer.

I kinda figured it out.  Sorta:

 

$url_replace = array("\"", """, "'", ",", ":", ";", ". ", "! ", "? ", "*",  "/ ","/", "(", ")", "+", "-");
$urltitle = strtolower($info['news_title']);
$urltitle = str_replace($url_replace, "", $urltitle." ");
$urltitle = str_replace("...", " ", $urltitle);
$urltitle = str_replace(" ", "-", trim($urltitle));

 

Basically I make the string lowercase, replace all the characters in the array $url_replace, replace the "..." character group with a space, then replace spaces with a trim()'ed version of $urltitle.  It seems to work, there are no duplicate dashes, and from what I can tell no missing spaces either.

If you didn't understand my post you should have just said so or at least tried it out. The...

 

~[^\w ]~

 

... part of the array removes pretty much every character you have described anyway. The only things you needed to add to the array are words you wished removing as there is no collective way of identifying them.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.