Nitty-gritty Regular Expression Details
regex pythonI played the super nerdy and fun Regex Crossword, and I honed my rusty regexfu and learned some nitty-gritty regex details(using ipython repl for the demonstration):
+
matches 1 or more repetition of the preceding RE, not the matched content.
This is particularly less obvious for the group:
In [17]: re.match(r'G(H|O)+', 'GHO')
Out[17]: <_sre.SRE_Match at 0x1049cbeb8>
To match the content, use \number
:
In [18]: re.match(r'G(H|O)\1', 'GHH')
Out[18]: <_sre.SRE_Match at 0x1049cbf30>
-
in[]
is interpreted to ”-” literally only-
is escaped (e.g.[a\-z]
) or placed as the first or last character.
The range is inclusive with the ASCII or Unicode values. I was bitten hard by the fallacy when sanitizing the names of uploaded files.