Technology with opinion

Friday, February 16, 2007

Regex vs string.replace() Python vs PHP

RegEx or regular expressions have become popular now in every programming language. It involves a special string which identifies ways to match other strings. It's wonderful for performing a match with a sequences of strings especially if the logic for the match is somewhat complex. RegEx can greatly increase the performance of an app or slow it down & obfuscate it.

Before deciding to use a RegEx, first figure out what you want to do. Lets take two common & basic tasks: input validation of email & replacing a matching string.

First, since the rules for email can be somewhat complicated (for string matching) this would make a good candidate for RegEx, especially if the validation is taking place on the client (who cares about client CPU cycles?).

Benchmark results of 10K operations (smaller number is better)
Large String Length: 65181 chars
Small String Length: 97 chars
Simple String: 45 chars
Complex RegEx: email validation is a RegEx 60 chars

Python:
Testing 100K Loop
0:00:00.025730

Small simple replacement
string.replace()
0:00:00.032380

RegEx (compiled)
0:00:00.055416

RegEx (uncompiled)
0:00:00.054967

Large simple replacement
string.replace()
0:00:04.781832

RegEx (compiled)
0:00:03.997719

RegEx (uncompiled)
0:00:03.141389

Complicated RegEx
compiled
0:00:00.034440

uncompiled
0:00:00.080525

PHP
Testing 100K Loop:
0.0390350818634

Small simple replacement
string replace()
0.0481541156769

RegEx
0.0470488071442

Large simple replacement
string replace()
1.68269109726

RegEx (uncompiled)
1.76776599884

Complicated RegEx
0.0417048931122

Conclusion
Overall
Comparing language to language Python is faster in all categories except where the replace expression is simple & the target string is large, here PHP whooped up very nicely. As you can see there is no one size fits all rule, I would expect every language to have sorted results. In preliminary results with C# (.Net 2.0) the RegEx seemed very ineffecient, I hope to post results later.

As to precompiling Reg Ex, it only makes a difference if your expression is complicated, the more complicated your Reg Ex is the bigger boost in performance you get. As these tests show, precompiling your Reg Ex is not always faster.

Note: PHP doesn't support Unicode strings or precompiled RegEx, Python supports both of these, however Unicode RegEx results seemed fairly slow.

In Python
if the target string small & your matching expression is simple use string.replace(), if the target string is much larger then use RegEx uncompiled. With complicated Reg Ex use RegEx compiled.

In PHP
for small target strings with simple matching expression RegEx is slightly faster, differences are negligible. For larger target strings with simple matching string.replace is a bit faster.