Correctly applying case to Mac, Mc and Hyphenated names

As a developer, I like to make things as easy and consistent as possible for data entry, so I make good use of

location = location.toUpperCase();
family_name = utils.stringInitCap(family_name);
first_name = utils.stringInitCap(first_name);

I like to make my apps clever enough to sort out the correct case of things, eg if the user slips and enters ANtonio.

Many family names of British and Irish origin often incorrectly handled by InitCap (MacDowell, McKay, O’Connor etc). Neither are hyphenated names, which are common in many cultures.

Here’s a method I call ProperCase, attached to the field family_name onDataChange property.

// small method to correctly case names of Scottish and Irish origin, and hyphenated names.
// tested against 65000 common names from census data.
// only replace leading MC and MAC, 

// add spaces so InitCap works
// need a space at the front for the MC step - doesn't affect SIMCOE, TOMCZAK, RAMCHARAN etc
var temp_name = " " + family_name.toUpperCase(); 
temp_name = utils.stringReplace(temp_name, "O'", "O' ");
temp_name = utils.stringReplace(temp_name, "-", "- ");
temp_name = utils.stringReplace(temp_name, " MC", " MC ");

// need to skip short names beginning with MAC, eg MACE, MACK, MACY, MAKIN, MACON, MACRI, MACEY etc
// ignore names with <=5 chars as there are no (common) names MacXx
// from census data and phone listing, almost all Mac* with >5 char are MacXx*
// we need a look up of Mac names > 5 that don't fit the rule.  
// this list could instead be in a related table, user-editable, to allow for locally common exceptions.
// could also change the code to optionally allow the formatting to be overridden.
var macExceptions = "MACABAGAL|MACADANGDANG|MACAK|MACARIO|MACARO|MACAW|MACCA|MACCAR|MACCARONE|MACCORA|MACHRI|MACHOSS|MACHOY|MACHUCA|MACIAK|MACIEL|MACISZEWSKI|MACIULATIS|MACKOJC|MACLIDES|MACUCUK|MACUT"
// ALL CASES OF MACCH* ARE Macch* HANDLED BELOW

var words = utils.stringWordCount(temp_name);
var temp_word = "";
for ( var i = 1 ; i <= words ; i++ )
{
	temp_word = utils.stringMiddleWords(temp_name, i, 1) 
	if(utils.stringLeft(temp_word , 3) == "MAC" && temp_word.length > 5 && utils.stringLeft(temp_word , 5) != "MACCH"	&& utils.stringPatternCount(macExceptions, temp_word) < 1)
	{
		temp_name = utils.stringReplace(temp_name, temp_word, "MAC " + utils.stringRight(temp_word, temp_word.length - 3));
		i++;
		words++; 
	}
} 
// apply InitCase
temp_name = " " + utils.stringInitCap(temp_name);
// strip added spaces
temp_name = utils.stringReplace(temp_name, "Mac ", "Mac");
temp_name = utils.stringReplace(temp_name, " Mc ", " Mc");
temp_name = utils.stringReplace(temp_name, "- ", "-");
temp_name = utils.stringReplace(temp_name, "O' ", "O'");

family_name = utils.stringTrim(temp_name);
return 1;

Using this, MAcdowell-o’rielly → MacDowell-O’Rielly

Another possible enhancement - via a setup global, users could be given the option of applying UPPER or Proper case to selected field.

Thanks for sharing, Antonio!

Great tip!

Thanks, I’d be interested to hear from anyone who can see other cases that could be included with a rule, such as d’Souza or van der Hayden.

Just one last thought.

Wonder if anyone has written a regEx to do this kind of thing? regEx is very powerful, but cryptic. You may be able to replace the whole thing with one line of code :wink: Try Google.

I never did find a Regex that would do the trick, but this one might be close.

http://livetrix.wiki.ub.rug.nl/index.php/Features/Author_name_normalization

The trick is - there are some names like Macey that don’t change case, which a method can be deisgned to handle.

If you see a way to do this with Regex more efficiently, I’d love to learn more.