I played the super nerdy and fun Regex Crossword, and I honed my rusty regexfu and learned some nitty-gritty regex details(using ipython repl for the demonstration):
+matches 1 or more repetition of the preceding RE, not the matched content.
This is particularly less obvious for the group:
In : re.match(r'G(H|O)+', 'GHO') Out: <_sre.SRE_Match at 0x1049cbeb8>
To match the content, use
In : re.match(r'G(H|O)\1', 'GHH') Out: <_sre.SRE_Match at 0x1049cbf30>
is interpreted to ”-” literally only
-is escaped (e.g.
[a\-z]) or placed as the first or last character.
The range is inclusive with the ASCII or Unicode values. I was bitten hard by the fallacy when sanitizing the names of uploaded files.