RegEx or regular expressions have become popular now in every programming language. It involves a special string which identifies ways to match other strings. It's wonderful for performing a match with a sequences of strings especially if the logic for the match is somewhat complex. RegEx can greatly increase the performance of an app or slow it down & obfuscate it.
Before deciding to use a RegEx, first figure out what you want to do. Lets take two common & basic tasks: input validation of email & replacing a matching string.
First, since the rules for email can be somewhat complicated (for string matching) this would make a good candidate for RegEx, especially if the validation is taking place on the client (who cares about client CPU cycles?).
Benchmark results of 10K operations (smaller number is better)
Large String Length: 65181 chars
Small String Length: 97 chars
Simple String: 45 chars
Complex RegEx: email validation is a RegEx 60 chars
Python:
Testing 100K Loop
0:00:00.025730
Small simple replacement
string.replace()
0:00:00.032380
RegEx (compiled)
0:00:00.055416
RegEx (uncompiled)
0:00:00.054967
Large simple replacement
string.replace()
0:00:04.781832
RegEx (compiled)
0:00:03.997719
RegEx (uncompiled)
0:00:03.141389
Complicated RegEx
compiled
0:00:00.034440
uncompiled
0:00:00.080525
PHP
Testing 100K Loop:
0.0390350818634
Small simple replacement
string replace()
0.0481541156769
RegEx
0.0470488071442
Large simple replacement
string replace()
1.68269109726
RegEx (uncompiled)
1.76776599884
Complicated RegEx
0.0417048931122
Conclusion
Overall
Comparing language to language Python is faster in all categories except where the replace expression is simple & the target string is large, here PHP whooped up very nicely. As you can see there is no one size fits all rule, I would expect every language to have sorted results. In preliminary results with C# (.Net 2.0) the RegEx seemed very ineffecient, I hope to post results later.
As to precompiling Reg Ex, it only makes a difference if your expression is complicated, the more complicated your Reg Ex is the bigger boost in performance you get. As these tests show, precompiling your Reg Ex is not always faster.
Note: PHP doesn't support Unicode strings or precompiled RegEx, Python supports both of these, however Unicode RegEx results seemed fairly slow.
In Python
if the target string small & your matching expression is simple use string.replace(), if the target string is much larger then use RegEx uncompiled. With complicated Reg Ex use RegEx compiled.
In PHP
for small target strings with simple matching expression RegEx is slightly faster, differences are negligible. For larger target strings with simple matching string.replace is a bit faster.
Technology with opinion
Friday, February 16, 2007
Subscribe to:
Posts (Atom)