In this blog post you will learn about regular expressions (RegEx), and use Python’s re module to work with RegEx (with the help of examples). A Regular Expression (RegEx) is a sequence of characters that defines a search pattern.
For loop to incorporate a regex search
import re
[w for w in text8 if re.search('@[A-Za-z0-9_]+', w)]
Regex example description for “@[A-Za-z0-9_]+”
- Finds anything beginning with ‘@’
- Followed by any alphabet, digit, or underscore
- Repeats at least once, but any number of times is accepted
Match something after ‘@’
@[A-Za-z0-9_]+
Regex Meta Characters
. = wildcard, matches a single character but only once
^ = start of a string
[] = matches one of the set of characters within []
[a-z] = matches on the range of a...z
[^abc] = matches a character that is not a,b, or c
a|b = matches either a or b, where a and b are strings
() = scoping characters
\ = escape characters such as (\t, \n, \b)
\b = matches word boundary
\d = Any digit, equivalent to [0-9]
\D = Any non-digit, equivalent to [^0-9]
\s = Any whitespace, equivalent to [ \t\n\r\f\v]
\S = Any non-whitespace, equivalent to [^ \t\n\r\v]
\w = Alphanumeric character, equivalent to [a-zA-Z0-9_]
\W = Any non alphanumeric character, equivalent to [^ a-zA-Z0-9_]
Regex Repetitions
* = matches zero or more times
+ = matches one or more times
? = matches zero or once times
{n} = exactly n repetitions
{,n} = at most n repetitions
{m,n} = at least m and at most n repetitions