Python Regex Cheatsheet
A cheatsheet for regex in Python with the re
module.
Python Regex Cheatsheet
This is a cheatsheet for regex in Python with the re
module. I plan to update this as I go, but the essentials are here.
Table of Contents
Basic Patterns
Character Classes
.
- Matches any character except newline\w
- Matches word characters (letters, digits, underscores)\W
- Matches non-word characters\d
- Matches digits (0-9)\D
- Matches non-digits\s
- Matches whitespace (space, tab, newline)\S
- Matches non-whitespace[abc]
- Matches any character in the set[^abc]
- Matches any character not in the set[a-z]
- Matches any character in the range[A-Z]
- Matches any uppercase letter[0-9]
- Matches any digit
Anchors
^
- Start of string/line$
- End of string/line\b
- Word boundary\B
- Not a word boundary
Quantifiers
*
- Match 0 or more times+
- Match 1 or more times?
- Match 0 or 1 time{n}
- Match exactly n times{n,}
- Match n or more times{n,m}
- Match between n and m times*?
- Match 0 or more times (non-greedy)+?
- Match 1 or more times (non-greedy)
Groups and References
(...)
- Capturing group(?:...)
- Non-capturing group(?P<name>...)
- Named capturing group(?P=name)
- Reference named group\1, \2, etc.
- Reference numbered group
re
Functions
import re
# Search for pattern in string
re.search(pattern, string) # Returns Match object or None
# Find all non-overlapping matches
re.findall(pattern, string) # Returns list of strings
# Split string by pattern
re.split(pattern, string) # Returns list of strings
# Replace pattern with repl string
re.sub(pattern, repl, string) # Returns modified string
# Compile pattern for reuse
regex = re.compile(pattern)
Modifiers / Function Flags
# At the start of the pattern:
(?i) # Case-insensitive matching
(?m) # Multiline mode (^ and $ match start/end of each line)
(?s) # Dot matches newline
(?x) # Verbose mode (allows comments and whitespace)
# As function flags:
re.IGNORECASE # or re.I - Case-insensitive matching
re.MULTILINE # or re.M - Multi-line matching
re.DOTALL # or re.S - Dot matches any char including newline
re.VERBOSE # or re.X - Allow comments and whitespace
re.ASCII # or re.A - ASCII-only matching
re.UNICODE # or re.U - Unicode matching (default in Python 3)
Examples
- Answer sanitization: Say you’re extracting the final answer to a math problem from a larger text. The text has “………The final answer is \boxed{123}….”. You want to extract the 123. Sometimes the answer has a dollar sign, sometimes it doesn’t.
# regex pattern extracts the answer as a named group. The ?P<answer> is the named group.
# \d+ matches one or more digits.
pattern = r"The final answer is \\boxed{(?P<answer>\d+)}[\.]*"
# replace dollar sign with nothing
new_text = re.sub(r"\$", "", text) # or use text.replace("$", "")
# Use the regex to extract the answer
answer = re.search(pattern, new_text).group('answer')
Notes
- Use raw strings (
r"pattern"
) for regex patterns to avoid escape character issues - Use
re.compile()
for patterns you’ll use multiple times.