In this blog post you will learn about regular expressions (RegEx), and use Python’s re module to work with RegEx (with the help of examples). A Regular Expression (RegEx) is a sequence of characters that defines a search pattern.
For loop to incorporate a regex search
import re
[w for w in text8 if re.search('@[A-Za-z0-9_]+', w)]
Regex example description for “@[A-Za-z0-9_]+”
- Finds anything beginning with ‘@’
- Followed by any alphabet, digit, or underscore
- Repeats at least once, but any number of times is accepted
Match something after ‘@’
@[A-Za-z0-9_]+
Regex Meta Characters
. = wildcard, matches a single character but only once
^ = start of a string
[] = matches one of the set of characters within []
[a-z] = matches on the range of a...z
[^abc] = matches a character that is not a,b, or c
a|b = matches either a or b, where a and b are strings
() = scoping characters
\ = escape characters such as (\t, \n, \b)
\b = matches word boundary
\d = Any digit, equivalent to [0-9]
\D = Any non-digit, equivalent to [^0-9]
\s = Any whitespace, equivalent to [ \t\n\r\f\v]
\S = Any non-whitespace, equivalent to [^ \t\n\r\v]
\w = Alphanumeric character, equivalent to [a-zA-Z0-9_]
\W = Any non alphanumeric character, equivalent to [^ a-zA-Z0-9_]
Regex Repetitions
* = matches zero or more times
+ = matches one or more times
? = matches zero or once times
{n} = exactly n repetitions
{,n} = at most n repetitions
{m,n} = at least m and at most n repetitions
And as always, thank you for taking the time to read this. If you have any comments, questions, or critiques, please reach out to me on our FREE ML Security Discord Server – HERE
