Jump to content

Regex for (U.S.) last name formatting


CroNiX

Recommended Posts

I was wondering if anybody had a handy regex for a preg_replace that could be used to format a last name.  I need to convert things like:

 

SMITH -> Smith (um, easy)

SMITH-JOHNSON -> Smith-Johnson (for the feminists  ;) )

MCCONNELL -> McConnell

MACDONALD -> MacDonald

O'CONNOR -> O'Connor

 

and any combination of the above, like:

O'CONNOR-MACDONALD -> O'Connor-MacDonald

 

Any help would be appreciated.  Im not sure if this can be done in a single regex as I have yet to master them.

 

Thanks.

Link to comment
https://forums.phpfreaks.com/topic/135594-regex-for-us-last-name-formatting/
Share on other sites

Here's my quick stab at it:

 

$str = 'O\'CONNOR-MACDONALD';
$arr = preg_split('#([\'-])#', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
if(count($arr) > 1){
   foreach($arr as $val){
      $val = ucfirst(strtolower($val));
      $newStr .= $val;
   }
   echo $newStr;
} else {
   $str = ucfirst(strtolower($str));
   echo $str;
}

 

Output:

O'Connor-Macdonald

That looks pretty good except it doesn't capture/replace

MCCONNELL -> McConnell [C should be capitalized]

MACDONALD -> MacDonald [D should be capitalized]

 

I think I can come up with a regex for those cases individually, but can these all be combined into one statement?  I can do basic regex, but Im a noob to them and don't know the advanced stuff.  I think regex is the hardest thing in a language to learn.

 

I appreciate your effort on this.

 

 

OOps, you're right CoNix..missed that part.. I'll see if I can revise it.. in the meantime, here is a slightly compressed version of what it currently does:

 

$str = 'O\'CONNOR-MACDONALD';
$arr = preg_split('#([\'-])#', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach($arr as $val){
   $val = ucfirst(strtolower($val));
   $newStr .= $val;
}
echo $newStr;

Ok, round 2.. It *seems* to do the job.. is it elegant / efficient? Who knows...

 

$str = 'O\'CONNOR-MaCDONALD';
$arr = preg_split('#([\'-])#', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach($arr as $val){
   $val = ucfirst(strtolower($val));
   if(preg_match('#Mac#', $val)){
      $val = 'Mac' . substr_replace(substr($val, 3, 1), strtoupper(substr($val, 3, 1)), $val) . substr($val, 4);
   } else if(preg_match('#Mc#', $val)){
      $val = 'Mc' . substr_replace(substr($val, 2, 1), strtoupper(substr($val, 2, 1)), $val) . substr($val, 3);
   }
   $newStr .= $val;
}
echo $newStr;

 

Output:

O'Connor-MacDonald

One note.. when I tested against the string "Shawn O'Reilly', my pattern didn't work.. so simply replace the preg_spit line with:

 

$arr = preg_split('#([ \'-])#', $str, -1, PREG_SPLIT_DELIM_CAPTURE);

 

I added a space as one of the characters within the character class.. this corrects the above Shawn O'Reilly test.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.