Jump to content

How to check only alphanumeric and other language alphabet by using preg_replace?


Recommended Posts

I need to create a SEO friendly string only from alphanumeric and characters of my native language. It is sinhala.

My expected string should be something like this:

$myString = "this-is-a-දහසක්-බාධක-දුක්-කම්කටොලු-මැදින්-ලෝකය-දිනන්නට-වෙර-දරන";

I am using a function to create the string like this. And that function is as follow:

function seoUrl($string) {
    //Lower case everything
    $string = strtolower($string);
    //Make alphanumeric (removes all other characters)
    $string = preg_replace("/[^a-z0-9_\s-]/", "", $string);
    //Clean up multiple dashes or whitespaces
    $string = preg_replace("/[\s-]+/", " ", $string);
    //Convert whitespaces and underscore to dash
    $string = preg_replace("/[\s_]/", "-", $string);
    return $string;
}

This function only works for English characters and output of above string as below:

$title = seoUrl("this-is-a-දහසක්-බාධක-දුක්-කම්කටොලු-මැදින්-ලෝකය-දිනන්නට-වෙර-දරන");
echo $title; // this-is-a-

I modified this function using `mb_ereg_replace` as below:
 

function seoUrl($string) {
    //Lower case everything
    //$string = strtolower($string);
    //Make alphanumeric (removes all other characters)
    $string = mb_ereg_replace("/[^a-z0-9_\s-]/", "", $string);
    //Clean up multiple dashes or whitespaces
    $string = mb_ereg_replace("/[\s-]+/", " ", $string);
    //Convert whitespaces and underscore to dash
    $string = mb_ereg_replace("/[\s_]/", "-", $string);
    return $string;
}

But is not working for me.

Can anybody tell me how to modify above function to get all my characters (including my native language characters)

Hope somebody may help me out. Thank you.

Edited by thara

Thanks for reply.

Yes I tried it like that.

Updated version:

function seoUrl($string) {
  //Lower case everything
  $string = strtolower($string);
  //Make alphanumeric (removes all other characters)
  $string = preg_replace("/[^\pL\pN_\s-]/u", "", $string);
  //Clean up multiple dashes or whitespaces
  $string = preg_replace("/[\s-]+/", " ", $string);
  //Convert whitespaces and underscore to dash
  $string = preg_replace("/[\s_]/", "-", $string);
  return $string;
}

$title = seoUrl("this-is-a-දහසක්-බාධක-දුක්-කම්කටොලු-මැදින්-ලෝකය-දිනන්නට-වෙර-දරන");
echo $title;

Output:

this-is-a-දහසක-බධක-දක-කමකටල-මදන-ලකය-දනනනට-වර-දරන

But some parts are missing in sinhala characters. Please look at two string closely, you will notice the difference.

 

My guess would be that the altered characters are using some sort of combining marks that \pL isn't including. Another \pX option should get it.

Or you could drop the /u mode and blindly accept all high bytes (\x7F-\xFF). You'll only be able to filter out standard ASCII characters but maybe that's all you need.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.