Jump to content

Regex for (U.S.) last name formatting


CroNiX

Recommended Posts

I was wondering if anybody had a handy regex for a preg_replace that could be used to format a last name.  I need to convert things like:

 

SMITH -> Smith (um, easy)

SMITH-JOHNSON -> Smith-Johnson (for the feminists  ;) )

MCCONNELL -> McConnell

MACDONALD -> MacDonald

O'CONNOR -> O'Connor

 

and any combination of the above, like:

O'CONNOR-MACDONALD -> O'Connor-MacDonald

 

Any help would be appreciated.  Im not sure if this can be done in a single regex as I have yet to master them.

 

Thanks.

Link to comment
Share on other sites

Here's my quick stab at it:

 

$str = 'O\'CONNOR-MACDONALD';
$arr = preg_split('#([\'-])#', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
if(count($arr) > 1){
   foreach($arr as $val){
      $val = ucfirst(strtolower($val));
      $newStr .= $val;
   }
   echo $newStr;
} else {
   $str = ucfirst(strtolower($str));
   echo $str;
}

 

Output:

O'Connor-Macdonald

Link to comment
Share on other sites

That looks pretty good except it doesn't capture/replace

MCCONNELL -> McConnell [C should be capitalized]

MACDONALD -> MacDonald [D should be capitalized]

 

I think I can come up with a regex for those cases individually, but can these all be combined into one statement?  I can do basic regex, but Im a noob to them and don't know the advanced stuff.  I think regex is the hardest thing in a language to learn.

 

I appreciate your effort on this.

 

 

Link to comment
Share on other sites

OOps, you're right CoNix..missed that part.. I'll see if I can revise it.. in the meantime, here is a slightly compressed version of what it currently does:

 

$str = 'O\'CONNOR-MACDONALD';
$arr = preg_split('#([\'-])#', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach($arr as $val){
   $val = ucfirst(strtolower($val));
   $newStr .= $val;
}
echo $newStr;

Link to comment
Share on other sites

Ok, round 2.. It *seems* to do the job.. is it elegant / efficient? Who knows...

 

$str = 'O\'CONNOR-MaCDONALD';
$arr = preg_split('#([\'-])#', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach($arr as $val){
   $val = ucfirst(strtolower($val));
   if(preg_match('#Mac#', $val)){
      $val = 'Mac' . substr_replace(substr($val, 3, 1), strtoupper(substr($val, 3, 1)), $val) . substr($val, 4);
   } else if(preg_match('#Mc#', $val)){
      $val = 'Mc' . substr_replace(substr($val, 2, 1), strtoupper(substr($val, 2, 1)), $val) . substr($val, 3);
   }
   $newStr .= $val;
}
echo $newStr;

 

Output:

O'Connor-MacDonald

Link to comment
Share on other sites

One note.. when I tested against the string "Shawn O'Reilly', my pattern didn't work.. so simply replace the preg_spit line with:

 

$arr = preg_split('#([ \'-])#', $str, -1, PREG_SPLIT_DELIM_CAPTURE);

 

I added a space as one of the characters within the character class.. this corrects the above Shawn O'Reilly test.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.