Regex Cheat Sheet: A Complete Guide to Regular Expressions
Kordu Team · 2026-03-31
Key Takeaways
- Character classes (\d, \w, \s) and quantifiers (+, *, ?, {n}) cover 80% of real-world regex use cases.
- Lookaheads, lookbehinds, and named groups handle the other 20% -- but you will not need them daily.
- Regex works in every major language, text editor, and CLI tool. Learn it once, use it everywhere.
- Always test against real data before deploying. Edge cases in regex are relentless.
Test Your Patterns
Try any pattern from this guide against your own text. Matches and capture groups highlight in real time.
Test text
44 charsPreview
The quick brown fox jumps over 13 lazy dogs.
[The] [quick] [brown] [fox] [jumps] [over] [13] [lazy] [dogs].
Matches
9 foundMatch 1
0 to 3
The
Match 2
4 to 9
quick
Match 3
10 to 15
brown
Match 4
16 to 19
fox
Match 5
20 to 25
jumps
Match 6
26 to 30
over
Match 7
31 to 33
13
Match 8
34 to 38
lazy
Match 9
39 to 43
dogs
Presets
Character Classes
Character classes match a single character from a defined set. These are the foundation.
| Pattern | Matches | Example |
|---|---|---|
| . | Any character except newline | a.c matches 'abc', 'a1c', 'a-c' |
| \d | Any digit (0-9) | \d\d matches '42', '99' |
| \D | Any non-digit | \D+ matches 'hello' |
| \w | Word character (letter, digit, underscore) | \w+ matches 'hello_123' |
| \W | Non-word character | \W matches '@', ' ', '-' |
| \s | Whitespace (space, tab, newline) | \s+ matches ' ' |
| \S | Non-whitespace | \S+ matches 'hello' |
| [abc] | Any of a, b, or c | [aeiou] matches vowels |
| [^abc] | Any character except a, b, c | [^0-9] matches non-digits |
| [a-z] | Any character in range a to z | [A-Za-z] matches any letter |
The shorthands (\d, \w, \s) and their negations cover most needs. Square-bracket character classes handle everything else.
Quantifiers
Quantifiers control how many times a preceding element must appear.
*— zero or more.ab*cmatchesac,abc,abbc.+— one or more.ab+cmatchesabc,abbc, but notac.?— zero or one (optional).colou?rmatchescolorandcolour.{3}— exactly 3 times.\d{3}matches exactly three digits.{2,5}— between 2 and 5 times.{3,}— 3 or more times.
Greedy vs lazy
By default, quantifiers are greedy — they match as much as possible. Adding ? makes them lazy (match as little as possible). This matters enormously for parsing quoted strings or HTML.
Greedy: ".*" applied to he said "hello" and "goodbye"
Matches: "hello" and "goodbye" (first quote to last quote)
Lazy: ".*?" applied to he said "hello" and "goodbye"
Matches: "hello" then "goodbye" (shortest possible matches)
Anchors
^— start of string (or start of line in multiline mode).$— end of string (or end of line in multiline mode).\b— word boundary.\bcat\bmatchescatbut notcaterpillarorconcatenate.
Word boundaries prevent partial matches
Searching for log without boundaries matches blog, catalog, logarithm, and log. Use \blog\b to match only the standalone word. One of the most useful and underused regex features.
Groups and Alternation
Capturing groups
Parentheses () extract matched substrings. (\d{4})-(\d{2})-(\d{2}) applied to 2026-03-31 captures 2026, 03, 31.
Named groups
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2}) lets you reference matches by name instead of index. Far more readable in complex patterns.
Non-capturing groups
(?:...) groups elements without capturing. Useful when you need grouping for alternation or quantifiers but do not need the match. (?:https?|ftp):// groups protocol options without wasting a capture slot.
Alternation
| means “or”. cat|dog matches either. Use with groups: (cat|dog) food matches cat food or dog food.
Lookaheads and Lookbehinds
Zero-width assertions — they check what comes before or after the current position without consuming characters.
(?=...)— positive lookahead.\d+(?= dollars)matches100in100 dollarsbut not100 euros.(?!...)— negative lookahead.\d+(?! dollars)matches100in100 eurosbut not100 dollars.(?<=...)— positive lookbehind.(?<=\$)\d+matches50in$50.(?<!...)— negative lookbehind.(?<!\$)\d+matches50in50 itemsbut not$50.
Lookbehind compatibility
Lookbehinds work in JavaScript (ES2018+), Python, Java, C#, and most modern engines. Not supported in some older environments. If you need broad compatibility, restructure using lookaheads or capturing groups.
10 Practical Patterns
Paste any of these into the Regex Tester to see them work.
1. Email address (simplified)
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Covers the vast majority of real email addresses. The full RFC 5322 spec is absurdly complex and not worth implementing in regex.
2. URL
https?:\/\/[^\s/$.?#].[^\s]*
Matches HTTP and HTTPS URLs. For strict validation, use your language’s URL parser.
3. UK phone number
^(?:0|\+44)\d{9,10}$
Numbers starting with 0 or +44 followed by 9-10 digits.
4. Date (YYYY-MM-DD)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
Validates format, restricts months to 01-12, days to 01-31. Does not check whether February 30th exists — use a date library for that.
5. IPv4 address
^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$
Validates each octet is 0-255.
6. Hex colour code
^#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$
Matches #fff and #1a2b3c.
7. Strong password
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
At least 8 characters with one lowercase, one uppercase, one digit, one special character. Four positive lookaheads check each requirement independently.
8. HTML tag
<([a-z][a-z0-9]*)\b[^>]*>(.*?)<\/\1>
Matches opening and closing tags. \1 backreference ensures they match. Fine for quick extraction — do not use regex to parse HTML in production.
9. Trailing whitespace
[ \t]+$
Clean up code files. In multiline mode, matches whitespace at the end of each line.
10. Duplicate words
\b(\w+)\s+\1\b
Catches repeated words like “the the” or “is is”. The \1 backreference matches whatever the first group captured.
Common Mistakes
Forgetting to escape special characters. . matches any character, not a literal dot. Use \. for a period. Same for (, ), [, ], {, }, +, *, ?, ^, $, |, \.
Greedy when you need lazy. ".*" matches from the first quote to the last quote in the entire string. Use ".*?" for the nearest closing quote.
Anchoring only one end. ^\d+ checks that the string starts with digits but says nothing about what follows. ^\d+$ ensures the entire string is digits.
Over-engineering validation. Regex matches format. It does not validate semantics. Match the pattern with regex, then validate the logic in code.
Catastrophic backtracking. Nested quantifiers like (a+)+ cause exponential backtracking on almost-matching inputs. This freezes your application. Never nest quantifiers inside groups that are themselves quantified.
Keep Going
Start with the Regex Tester and experiment against real data. Build confidence with character classes and quantifiers before tackling lookaheads. And when a pattern grows beyond two lines — stop, and write a proper parser instead.