TheDeveloperBlog.com

Home | Contact Us

C-Sharp | Java | Python | Swift | GO | WPF | Ruby | Scala | F# | JavaScript | SQL | PHP | Angular | HTML

<< Back to PYTHON

Python re.match Performance

Test the performance of the re.match method against a custom def. Perform a benchmark.
Re, performance. With re.match we compare strings with a pattern. Usually regular expressions cause a performance loss. But is this important? I tested a custom method, written with for and if, against re.match.re.match, search
Example. We introduce two methods: stringmatch and stringmatch_re. Both methods test for the string "cat." The first and last letters must be present, and there may be one or more letter "a" in the middle.

Stringmatch: This method uses the if-statement to check the length and individual characters. It uses a for-loop to check the middle.

For

Stringmatch_re: This method is shorter. It uses the pattern "ca+t" to check for valid strings.

Python program that tests strings, uses re import re def stringmatch(s): # Check for "ca+t" with if-statements and loop. if len(s) >= 3 and s[0] == 'c' and s[len(s) - 1] == 't': for v in range(1, len(s) - 2): if s[v] != 'a': return False return True return False def stringmatch_re(s): # Check for "ca+t" with re. m = re.match(r"ca+t", s) if m: return True return False # Test these strings. tests = ["ct", "cat", "caaat", "dog", "car"] for t in tests: print(stringmatch(t), stringmatch_re(t), t) Output False False ct True True cat True True caaat False False dog False False car
Method notes. The methods both return the same values on the same strings. In many programs, stringmatch_re is a better choice because it is shorter and easier to understand. But it causes a performance loss.
Performance. I compared the two methods on some test strings. In all Python implementations, I found stringmatch, with no regular expressions, is faster. In Python 3.3, it takes less than half the time.

Note: In this experiment, stringmatch returns after finding an invalid length or an invalid start character.

Note 2: An optimized version of stringmatch_re, where the length and first character is checked outside re.match, might be possible.

Python program that times methods import re import time def stringmatch(s): # Check for "ca+t" with if-statements and loop. if len(s) >= 3 and s[0] == 'c' and s[len(s) - 1] == 't': for v in range(1, len(s) - 2): if s[v] != 'a': return False return True return False def stringmatch_re(s): # Check for "ca+t" with re. m = re.match(r"ca+t", s) if m: return True return False print(time.time()) # Version 1: string if, for for i in range(0, 10000000): result = stringmatch("ct") result = stringmatch("caat") result = stringmatch("dooog") print(time.time()) # Version 2: re.match for i in range(0, 10000000): result = stringmatch_re("ct") result = stringmatch_re("caat") result = stringmatch_re("dooog") print(time.time()) Output 1411309406.96144 1411309430.354504 stringmatch = 23.39 s 1411309480.849815 stringmatch_re = 50.50 s
Summary. In nearly all of my experiments, replacing a regular expression with an if-statement and loop is an optimization. Rarely, in an extremely slow language, the imperative approach may be slower. In Python this is not true.
© TheDeveloperBlog.com
The Dev Codes

Related Links:


Related Links

Adjectives Ado Ai Android Angular Antonyms Apache Articles Asp Autocad Automata Aws Azure Basic Binary Bitcoin Blockchain C Cassandra Change Coa Computer Control Cpp Create Creating C-Sharp Cyber Daa Data Dbms Deletion Devops Difference Discrete Es6 Ethical Examples Features Firebase Flutter Fs Git Go Hbase History Hive Hiveql How Html Idioms Insertion Installing Ios Java Joomla Js Kafka Kali Laravel Logical Machine Matlab Matrix Mongodb Mysql One Opencv Oracle Ordering Os Pandas Php Pig Pl Postgresql Powershell Prepositions Program Python React Ruby Scala Selecting Selenium Sentence Seo Sharepoint Software Spellings Spotting Spring Sql Sqlite Sqoop Svn Swift Synonyms Talend Testng Types Uml Unity Vbnet Verbal Webdriver What Wpf