Using Regular Expressions Part 2 - The Cocoa Connection

October 15, 2011    

Last time, in Part 1 of this series, I wrote about the basics of regular expressions, and the phrases I tend to use. Today, I’m going to talk about the mechanics of how I use Regular Expressions in Cocoa.

##But first, an historical diversion

In my opinion there are, two different ways that programming languages implement Regular Expressions: The perl/ruby way, and the Java/C#/Python/Cocoa way.

In ruby and perl, regexes are implemented directly on the String type, whereas in the other languages, there a separate object that contains the functionality. Here’s what you need to know to do a regex substitution on a string in ruby:

myString.sub(‘pattern’,‘replacement’)

clean, easy, and immediately useable if you know what pattern you want to use.

Here’s what you need to know to do the same thing in Cocoa:

+[NSRegularExpression regularExpressionWithPattern:(NSString *) pattern 
options:(NSRegularExpressionOptions)options error:(NSError **) error]

-[NSRegularExpression replaceMatchesInString:(NSMutableString *) string 
options:(NSMatchingOptions)options range:(NSRange)range 
withTemplate:(NSString *)template]

which is not clean, not easy and contains a bunch of stuff you have to go look up to be able to get started. What are NSRegularExpressionOptions and

NSMatchingOptions? What’s a template? Do I really have to create an

NSRange for this? And that leads to the obvious question: Is all this effort really worth it?

Now I don’t know about you, but I don’t want to spend any effort remembering any of those option parameters, and I don’t want to take the time to look them up any time I want to use a regular expression. To me, the beauty of Objective-C is that it gives us the ability to build most of what you need to know directly into the method signatures.

##Let’s simplify things a little

So that’s what I did. For the rest of this post, I’ll be using the categories on NSString found in a repo I wrote on github called RegexOnNString.

There are three basic methods I wrote:

-(NSString *) [NSString stringByReplacingRegexPattern:(NSString *)regex 
withString:(NSString *) replacement]

which takes a string, finds the occurrences of the regex pattern and replaces them with the string replacement.

-(NSArray *) [NSString stringsByExtractingGroupsUsingRegexPattern:(NSString *)regex]

which gives you an array of all the pattern groups (things in parentheses) it found in your string, and

-(BOOL) [NSString matchesPatternRegexPattern:(NSString *)regex]

which just tells you whether a pattern is present in your string or not.

There are two additional, optional parameters that you can add,

caseInsensitive:(BOOL) ignoreCase and treatAsOneLine:(BOOL) assumeMultiLine.

caseInsensitive is hopefully self explanatory, and treatAsOneLine just means that you expect that your string has (or might have) newline ( \n) characters in it, and you want them to be treated like any other character.

To get them, you just need to grab the MIT-licensed code from github, include NSString+PDRegex.h and NSString+PDRegex.m in your project, and put

#import “NSString+PDRegex.h”

in the top of your source file.

##How about some examples?

The simplest of these is the one that returns the boolean, like so:

if(![emailAddress matchesPatternRegexPattern:@"@.*\\\\..*"]) {
    NSLog(@"If the user is going to give us a fake email address" \
          @" they could at least try and make it look like one" \
          @" by making sure it has an at-sign and a dot in it.");
}

I use this a lot for string validation. No sense in trying to send an email if there isn’t an at-sign in it, and no use trying to convert an NSString to an NSURL if the string doesn’t at least contain ' ://’. (Note that I have to use two backslashes there because a @"@.*\." will cause Xcode to generate a: Lexical or Preprocessor Issue: Unknown escape sequence ‘\.’ warning).

The one I use next most often is the one that returns an NSString. I use this one for extracting substrings. For example, in:

<Warning: Shamless Plug> a Mac application I recently released as an Open Beta that helps iOS developers deploy Apps to their test devices without having to use a USB cable

I’m getting a string that is the path of the .app file that the user dragged-and-dropped onto my App (and it’s either a path if they dropped it onto the App Icon in the Dock, or a URL if they dropped it onto the Window). From that string, I need to figure out what their iOS App is named (so I can use that name in the notification). I use the stringByReplacingRegexPattern method for that. I could use [[NSString lastPathComponent] stringByDeletingPathExtension] for that, but by using regexes, I don’t have to go look up the path component methods, like I just did to put them in this post. But an even better example from that app is:

NSString *dSYMPath = [droppedPath stringByReplacingRegexPattern:@"\\.app$" withString:@".dSYM"];

So that I can save off the dSYMs so that the user can get to the symbol data for that build if they need it later.

I also use it so that, in order for my App to get the user’s Dropbox’s public URL, I can let the user drop any Public URL that Dropbox gives them into the preferences panel, and I can use:

NSString dbPublicRoot=[pastedLink stringByReplacingRegexPattern: @"^(http://dl.dropbox.com/u/[0-9][0-9])[^0-9].$" withString:@"$1" caseInsensitive:NO];

so that I don’t have to rely on the user to correctly truncate the URL at the right place, and the user doesn’t have to think about it.

The last method, the one that returns the NSArray, I don’t use as often, but when I do, it can save me a lot of effort. For example, recently I was implementing a Tic-Tac-Toe game as a programming exercise as part of an interview process. So when I was shipping turns between the two players, I was actually sending one string:

NSString *stringForThisMove = [NSString stringWithFormat:
      @"Move %@=Player %@ to Square %@\n",
      [move TurnNumber],
      playerThatMoved,
      [move SquarePlayed]];

and then on the receiving end, I used:

NSArray *extractedStrings=[moveString
       stringsByExtractingGroupsUsingRegexPattern:
       @" *^Move  *([0-9]) *= *Player  *([X|O])  *to  *Square  *([0-9]) *$"
       caseInsensitive:YES treatAsOneLine:YES];

from which [extractedStrings objectAtIndex:0] was the move number, [extractedStrings objectAtIndex:1] was the player (@“X” or @“O”) and [extractedStrings objectAtIndex:2] was the number of the square they moved to (where the first row of the board was 1-2-3 and the last row was 7-8-9).

Now, there are many other ways I could have encoded that, but the nice thing about using strings for it was that anyone looking at the intermediate value (in the debugger or logs) could easily tell what move was being talked about at that point, and if I were ever to need to come back to this code later, @“Move 1=Player X to Square 1” will make sense to me (after all, that kind of notation has been of use in the Chess world for hundreds, if not thousands of years).

##But aren’t Regular Expressions slower?

Well, define slow :-).

In the test suite for my RegexOnNString category, I have a test that does 1000 string replaces:

for (uint i=0; i&lt; 1000; i++) {
    if (lastTimeString) {
        NSString *currentNumberString=[NSString stringWithFormat:@"%u",i];
        NSString *replacementString=[lastTimeString stringByReplacingRegexPattern:@"[0-9][0-9]*" withString:currentNumberString caseInsensitive:NO];
        STAssertEqualObjects(replacementString, currentNumberString, @"regex replace failed");
    }
    lastTimeString=[NSString stringWithFormat:@"%u",i];
    i++;
}

Now each of those regex’s is different (by design), so I can’t compile them, and I’m creating and throwing away my NSRegularExpression object and a temporary NSString every run. So it’s near a worst-case scenario. By way of comparison, I do another loop of 1000 replaces, using [NSString stringByReplacingOccurrencesOfString: withString:] to see how much slower the regex makes the task.

The output from running the test on my 4thGen iPod touch is:

2011-10-15 17:23:00.748 RegexOnNSStringIOSExample[224:607] 1000 regex replaces took 0.167364 seconds 2011-10-15 17:23:00.776 RegexOnNSStringIOSExample[224:607] 1000 String replaces took 0.025167 seconds 2011-10-15 17:23:00.777 RegexOnNSStringIOSExample[224:607] Simple String substitution 6.650140 times faster

and on my daughter’s 2nd Gen iPod touch, the output is:

2011-10-15 17:33:37.631 RegexOnNSStringIOSExample[183:307] 1000 regex replaces took 0.641442 seconds 2011-10-15 17:33:37.756 RegexOnNSStringIOSExample[183:307] 1000 String replaces took 0.119230 seconds 2011-10-15 17:33:37.768 RegexOnNSStringIOSExample[183:307] Simple String substitution 5.379869 times faster

So yes, it’s slower. It takes 0.17 milliseconds on a 4th-gen touch and 0.64 milliseconds on a 2nd-gen touch. And it’s between 5 and 7 times slower than stringByReplacingOccurrencesOfString:withString. If 0.52 ms really matters in your code when running on a 2nd-gen touch, then you should use stringByReplacingOccurrencesOfString:withString instead.

###So, in conclusion,

I hope you found this post useful. If you need to do string manipulation, Regular Expressions are a time-tested way to do that, and I hope the extra methods I’ve talked about here will simplify things if you want to do string manipulation in your Cocoa code.