learn-python

6.7 KiB

Czysty Wina Historia

Regular Expressions in Python

Regular expressions (regex) are a powerful tool for pattern matching and text manipulation. Python's re module provides comprehensive support for regular expressions, enabling efficient text processing and validation. Regular expressions (regex) are a versitile tool for matching patterns in strings. In Python, the re module provides support for working with regular expressions.

1. Introduction to Regular Expressions

A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting specific patterns.

2. Basic Syntax

Literal Characters: Match exact characters (e.g., abc matches "abc"). Metacharacters: Special characters like ., *, ?, +, ^, $, [ ], and | used to build patterns.

Common Metacharacters:

.: Any character except newline.
^: Start of the string.
$: End of the string.
*: 0 or more repetitions.
+: 1 or more repetitions.
?: 0 or 1 repetition.
[]: Any one character inside brackets (e.g., [a-z]).
|: Either the pattern before or after.
\ : Used to drop the special meaning of character following it
{} : Indicate the number of occurrences of a preceding regex to match.
() : Enclose a group of Regex

Examples:

import re
pattern = r'c.t'
text = 'cat cot cut cit'
matches = re.findall(pattern, text)
print(matches)  # Output: ['cat', 'cot', 'cut', 'cit']

pattern = r'^Hello'
text = 'Hello, world!'
match = re.search(pattern, text)
print(match.group() if match else 'No match')  # Output: 'Hello'

pattern = r'world!$'
text = 'Hello, world!'
match = re.search(pattern, text)
print(match.group() if match else 'No match')  # Output: 'world!'

pattern = r'ab*'
text = 'a ab abb abbb'
matches = re.findall(pattern, text)
print(matches)  # Output: ['a', 'ab', 'abb', 'abbb']

pattern = r'ab+'
text = 'a ab abb abbb'
matches = re.findall(pattern, text)
print(matches)  # Output: ['ab', 'abb', 'abbb']

pattern = r'ab?'
text = 'a ab abb abbb'
matches = re.findall(pattern, text)
print(matches)  # Output: ['a', 'ab', 'ab', 'ab']

[]

pattern = r'[aeiou]'
text = 'hello world'
matches = re.findall(pattern, text)
print(matches)  # Output: ['e', 'o', 'o']

pattern = r'cat|dog'
text = 'I have a cat and a dog.'
matches = re.findall(pattern, text)
print(matches)  # Output: ['cat', 'dog']

pattern = r'\$100'
text = 'The price is $100.'
match = re.search(pattern, text)
print(match.group() if match else 'No match')  # Output: '$100'

{}

pattern = r'\d{3}'
text = 'My number is 123456'
matches = re.findall(pattern, text)
print(matches)  # Output: ['123', '456']

()

pattern = r'(cat|dog)'
text = 'I have a cat and a dog.'
matches = re.findall(pattern, text)
print(matches)  # Output: ['cat', 'dog']

3. Using the re Module

Key functions in the re module:

re.match(): Checks for a match at the beginning of the string.
re.search(): Searches for a match anywhere in the string.
re.findall(): Returns a list of all matches.
re.sub(): Replaces matches with a specified string.
re.split(): Returns a list where the string has been split at each match.
re.escape(): Escapes special character Examples:

import re

# Match at the beginning
print(re.match(r'\d+', '123abc').group())  # Output: 123

# Search anywhere
print(re.search(r'\d+', 'abc123').group())  # Output: 123

# Find all matches
print(re.findall(r'\d+', 'abc123def456'))  # Output: ['123', '456']

# Substitute matches
print(re.sub(r'\d+', '#', 'abc123def456'))  # Output: abc#def#

#Return a list where it get matched
print(re.split("\s", txt)) #['The', 'Donkey', 'in', 'the','Town']

# Escape special character
print(re.escape("We are good to go"))  #We\ are\ good\ to\ go

4. Compiling Regular Expressions

Compiling regular expressions improves performance for repeated use.

Example:

import re

pattern = re.compile(r'\d+')
print(pattern.match('123abc').group())  # Output: 123
print(pattern.search('abc123').group())  # Output: 123
print(pattern.findall('abc123def456'))  # Output: ['123', '456']

5. Groups and Capturing

Parentheses () group and capture parts of the match.

Example:

import re

match = re.match(r'(\d{3})-(\d{2})-(\d{4})', '123-45-6789')
if match:
    print(match.group())   # Output: 123-45-6789
    print(match.group(1))  # Output: 123
    print(match.group(2))  # Output: 45
    print(match.group(3))  # Output: 6789

6. Special Sequences

Special sequences are shortcuts for common patterns:

\A:Returns a match if the specified characters are at the beginning of the string.
\b:Returns a match where the specified characters are at the beginning or at the end of a word.
\B:Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word.
\d: Any digit.
\D: Any non-digit.
\w: Any alphanumeric character.
\W: Any non-alphanumeric character.
\s: Any whitespace character.
\S: Any non-whitespace character.
\Z:Returns a match if the specified characters are at the end of the string.

Example:

import re

print(re.search(r'\w+@\w+\.\w+', 'Contact: support@example.com').group())  # Output: support@example.com

7.Sets

A set is a set of characters inside a pair of square brackets [] with a special meaning:

[arn] : Returns a match where one of the specified characters (a, r, or n) is present.
[a-n] : Returns a match for any lower case character, alphabetically between a and n.
[^arn] : Returns a match for any character EXCEPT a, r, and n.
[0123] : Returns a match where any of the specified digits (0, 1, 2, or 3) are present.
[0-9] : Returns a match for any digit between 0 and 9.
[0-5][0-9] : Returns a match for any two-digit numbers from 00 and 59.
[a-zA-Z] : Returns a match for any character alphabetically between a and z, lower case OR upper case.
[+] : In sets, +, *, ., |, (), $,{} has no special meaning
[+] means: return a match for any + character in the string.

Summary

Regular expressions (regex) are a powerful tool for text processing in Python, offering a flexible way to match, search, and manipulate text patterns. The re module provides a comprehensive set of functions and metacharacters to tackle complex text processing tasks. With regex, you can: 1.Match patterns: Use metacharacters like ., *, ?, and {} to match specific patterns in text. 2.Search text: Employ functions like re.search() and re.match() to find occurrences of patterns in text. 3.Manipulate text: Utilize functions like re.sub() to replace patterns with new text.

6.7 KiB Czysty Wina Historia

Regular Expressions in Python

1. Introduction to Regular Expressions

2. Basic Syntax

3. Using the re Module

4. Compiling Regular Expressions

5. Groups and Capturing

6. Special Sequences

7.Sets

Summary

6.7 KiB

Czysty Wina Historia