thara Posted January 28, 2019 Share Posted January 28, 2019 (edited) I need to create a SEO friendly string only from alphanumeric and characters of my native language. It is sinhala. My expected string should be something like this: $myString = "this-is-a-දහසක්-බාධක-දුක්-කම්කටොලු-මැදින්-ලෝකය-දිනන්නට-වෙර-දරන"; I am using a function to create the string like this. And that function is as follow: function seoUrl($string) { //Lower case everything $string = strtolower($string); //Make alphanumeric (removes all other characters) $string = preg_replace("/[^a-z0-9_\s-]/", "", $string); //Clean up multiple dashes or whitespaces $string = preg_replace("/[\s-]+/", " ", $string); //Convert whitespaces and underscore to dash $string = preg_replace("/[\s_]/", "-", $string); return $string; } This function only works for English characters and output of above string as below: $title = seoUrl("this-is-a-දහසක්-බාධක-දුක්-කම්කටොලු-මැදින්-ලෝකය-දිනන්නට-වෙර-දරන"); echo $title; // this-is-a- I modified this function using `mb_ereg_replace` as below: function seoUrl($string) { //Lower case everything //$string = strtolower($string); //Make alphanumeric (removes all other characters) $string = mb_ereg_replace("/[^a-z0-9_\s-]/", "", $string); //Clean up multiple dashes or whitespaces $string = mb_ereg_replace("/[\s-]+/", " ", $string); //Convert whitespaces and underscore to dash $string = mb_ereg_replace("/[\s_]/", "-", $string); return $string; } But is not working for me. Can anybody tell me how to modify above function to get all my characters (including my native language characters) Hope somebody may help me out. Thank you. Edited January 28, 2019 by thara Quote Link to comment Share on other sites More sharing options...
requinix Posted January 28, 2019 Share Posted January 28, 2019 Be normal and use the preg functions and UTF-8 strings with /u mode. \w matches letters, numbers, and underscores. Quote Link to comment Share on other sites More sharing options...
thara Posted January 28, 2019 Author Share Posted January 28, 2019 Thanks for reply. Yes I tried it like that. Updated version: function seoUrl($string) { //Lower case everything $string = strtolower($string); //Make alphanumeric (removes all other characters) $string = preg_replace("/[^\pL\pN_\s-]/u", "", $string); //Clean up multiple dashes or whitespaces $string = preg_replace("/[\s-]+/", " ", $string); //Convert whitespaces and underscore to dash $string = preg_replace("/[\s_]/", "-", $string); return $string; } $title = seoUrl("this-is-a-දහසක්-බාධක-දුක්-කම්කටොලු-මැදින්-ලෝකය-දිනන්නට-වෙර-දරන"); echo $title; Output: this-is-a-දහසක-බධක-දක-කමකටල-මදන-ලකය-දනනනට-වර-දරන But some parts are missing in sinhala characters. Please look at two string closely, you will notice the difference. Quote Link to comment Share on other sites More sharing options...
requinix Posted January 28, 2019 Share Posted January 28, 2019 My guess would be that the altered characters are using some sort of combining marks that \pL isn't including. Another \pX option should get it. Or you could drop the /u mode and blindly accept all high bytes (\x7F-\xFF). You'll only be able to filter out standard ASCII characters but maybe that's all you need. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.