learn-python/contrib/advanced-python/regular_expressions.md

2.9 KiB

Regular Expressions in Python

Regular expressions (regex) are a powerful tool for pattern matching and text manipulation. Python's re module provides comprehensive support for regular expressions, enabling efficient text processing and validation.

1. Introduction to Regular Expressions

A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting specific patterns.

2. Basic Syntax

Literal Characters: Match exact characters (e.g., abc matches "abc"). Metacharacters: Special characters like ., *, ?, +, ^, $, [ ], and | used to build patterns.

Common Metacharacters:

  • .: Any character except newline.
  • ^: Start of the string.
  • $: End of the string.
  • *: 0 or more repetitions.
  • +: 1 or more repetitions.
  • ?: 0 or 1 repetition.
  • []: Any one character inside brackets (e.g., [a-z]).
  • |: Either the pattern before or after.

3. Using the re Module

Key functions in the re module:

  • re.match(): Checks for a match at the beginning of the string.
  • re.search(): Searches for a match anywhere in the string.
  • re.findall(): Returns a list of all matches.
  • re.sub(): Replaces matches with a specified string.

Examples:

import re

# Match at the beginning
print(re.match(r'\d+', '123abc').group())  # Output: 123

# Search anywhere
print(re.search(r'\d+', 'abc123').group())  # Output: 123

# Find all matches
print(re.findall(r'\d+', 'abc123def456'))  # Output: ['123', '456']

# Substitute matches
print(re.sub(r'\d+', '#', 'abc123def456'))  # Output: abc#def#

4. Compiling Regular Expressions

Compiling regular expressions improves performance for repeated use.

Example:

import re

pattern = re.compile(r'\d+')
print(pattern.match('123abc').group())  # Output: 123
print(pattern.search('abc123').group())  # Output: 123
print(pattern.findall('abc123def456'))  # Output: ['123', '456']

5. Groups and Capturing

Parentheses () group and capture parts of the match.

Example:

import re

match = re.match(r'(\d{3})-(\d{2})-(\d{4})', '123-45-6789')
if match:
    print(match.group())   # Output: 123-45-6789
    print(match.group(1))  # Output: 123
    print(match.group(2))  # Output: 45
    print(match.group(3))  # Output: 6789

6. Special Sequences

Special sequences are shortcuts for common patterns:

  • \d: Any digit.
  • \D: Any non-digit.
  • \w: Any alphanumeric character.
  • \W: Any non-alphanumeric character.
  • \s: Any whitespace character.
  • \S: Any non-whitespace character. Example:
import re

print(re.search(r'\w+@\w+\.\w+', 'Contact: support@example.com').group())  # Output: support@example.com

Summary

Regular expressions are a versatile tool for text processing in Python. The re module offers powerful functions and metacharacters for pattern matching, searching, and manipulation, making it an essential skill for handling complex text processing tasks.