pull/1300/head
Damini2004 2024-06-26 14:31:54 +05:30
rodzic 417e3c81a3
commit b2ef489f44
1 zmienionych plików z 88 dodań i 43 usunięć

Wyświetl plik

@ -1,112 +1,147 @@
## Regular Expressions in Python ## Regular Expressions in Python
Regular expressions (regex) are a powerful tool for pattern matching and text manipulation.
Regular expressions (regex) are a powerful tool for pattern matching and text manipulation.
Python's re module provides comprehensive support for regular expressions, enabling efficient text processing and validation. Python's re module provides comprehensive support for regular expressions, enabling efficient text processing and validation.
Regular expressions (regex) are a versitile tool for matching patterns in strings. In Python, the `re` module provides support for working with regular expressions. Regular expressions (regex) are a versitile tool for matching patterns in strings. In Python, the `re` module provides support for working with regular expressions.
## 1. Introduction to Regular Expressions ## 1. Introduction to Regular Expressions
A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting
A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting
specific patterns. specific patterns.
## 2. Basic Syntax ## 2. Basic Syntax
Literal Characters: Match exact characters (e.g., abc matches "abc"). Literal Characters: Match exact characters (e.g., abc matches "abc").
Metacharacters: Special characters like ., *, ?, +, ^, $, [ ], and | used to build patterns. Metacharacters: Special characters like ., \*, ?, +, ^, $, [ ], and | used to build patterns.
**Common Metacharacters:** **Common Metacharacters:**
* .: Any character except newline. - .: Any character except newline.
* ^: Start of the string. - ^: Start of the string.
* $: End of the string. - $: End of the string.
* *: 0 or more repetitions. - \*: 0 or more repetitions.
* +: 1 or more repetitions. - +: 1 or more repetitions.
* ?: 0 or 1 repetition. - ?: 0 or 1 repetition.
* []: Any one character inside brackets (e.g., [a-z]). - []: Any one character inside brackets (e.g., [a-z]).
* |: Either the pattern before or after. - |: Either the pattern before or after.
* \ : Used to drop the special meaning of character following it - \ : Used to drop the special meaning of character following it
* {} : Indicate the number of occurrences of a preceding regex to match. - {} : Indicate the number of occurrences of a preceding regex to match.
* () : Enclose a group of Regex - () : Enclose a group of Regex
Examples: Examples:
```bash
1. `.` 1. `.`
```bash
import re import re
pattern = r'c.t' pattern = r'c.t'
text = 'cat cot cut cit' text = 'cat cot cut cit'
matches = re.findall(pattern, text) matches = re.findall(pattern, text)
print(matches) # Output: ['cat', 'cot', 'cut', 'cit'] print(matches) # Output: ['cat', 'cot', 'cut', 'cit']
```
2. `^` 2. `^`
```bash
pattern = r'^Hello' pattern = r'^Hello'
text = 'Hello, world!' text = 'Hello, world!'
match = re.search(pattern, text) match = re.search(pattern, text)
print(match.group() if match else 'No match') # Output: 'Hello' print(match.group() if match else 'No match') # Output: 'Hello'
```
3. `$` 3. `$`
```bash
pattern = r'world!$' pattern = r'world!$'
text = 'Hello, world!' text = 'Hello, world!'
match = re.search(pattern, text) match = re.search(pattern, text)
print(match.group() if match else 'No match') # Output: 'world!' print(match.group() if match else 'No match') # Output: 'world!'
```
4. `*` 4. `*`
```bash
pattern = r'ab*' pattern = r'ab*'
text = 'a ab abb abbb' text = 'a ab abb abbb'
matches = re.findall(pattern, text) matches = re.findall(pattern, text)
print(matches) # Output: ['a', 'ab', 'abb', 'abbb'] print(matches) # Output: ['a', 'ab', 'abb', 'abbb']
```
5. `+` 5. `+`
```bash
pattern = r'ab+' pattern = r'ab+'
text = 'a ab abb abbb' text = 'a ab abb abbb'
matches = re.findall(pattern, text) matches = re.findall(pattern, text)
print(matches) # Output: ['ab', 'abb', 'abbb'] print(matches) # Output: ['ab', 'abb', 'abbb']
```
6. `?` 6. `?`
```bash
pattern = r'ab?' pattern = r'ab?'
text = 'a ab abb abbb' text = 'a ab abb abbb'
matches = re.findall(pattern, text) matches = re.findall(pattern, text)
print(matches) # Output: ['a', 'ab', 'ab', 'ab'] print(matches) # Output: ['a', 'ab', 'ab', 'ab']
```
7. `[]` 7. `[]`
```bash
pattern = r'[aeiou]' pattern = r'[aeiou]'
text = 'hello world' text = 'hello world'
matches = re.findall(pattern, text) matches = re.findall(pattern, text)
print(matches) # Output: ['e', 'o', 'o'] print(matches) # Output: ['e', 'o', 'o']
```
8. `|` 8. `|`
```bash
pattern = r'cat|dog' pattern = r'cat|dog'
text = 'I have a cat and a dog.' text = 'I have a cat and a dog.'
matches = re.findall(pattern, text) matches = re.findall(pattern, text)
print(matches) # Output: ['cat', 'dog'] print(matches) # Output: ['cat', 'dog']
```
9. `\`` 9. `\``
```bash
pattern = r'\$100' pattern = r'\$100'
text = 'The price is $100.' text = 'The price is $100.'
match = re.search(pattern, text) match = re.search(pattern, text)
print(match.group() if match else 'No match') # Output: '$100' print(match.group() if match else 'No match') # Output: '$100'
```
10. `{}` 10. `{}`
```bash
pattern = r'\d{3}' pattern = r'\d{3}'
text = 'My number is 123456' text = 'My number is 123456'
matches = re.findall(pattern, text) matches = re.findall(pattern, text)
print(matches) # Output: ['123', '456'] print(matches) # Output: ['123', '456']
```
11. `()` 11. `()`
```bash
pattern = r'(cat|dog)' pattern = r'(cat|dog)'
text = 'I have a cat and a dog.' text = 'I have a cat and a dog.'
matches = re.findall(pattern, text) matches = re.findall(pattern, text)
print(matches) # Output: ['cat', 'dog'] print(matches) # Output: ['cat', 'dog']
``` ```
## 3. Using the re Module ## 3. Using the re Module
**Key functions in the re module:** **Key functions in the re module:**
* re.match(): Checks for a match at the beginning of the string. - re.match(): Checks for a match at the beginning of the string.
* re.search(): Searches for a match anywhere in the string. - re.search(): Searches for a match anywhere in the string.
* re.findall(): Returns a list of all matches. - re.findall(): Returns a list of all matches.
* re.sub(): Replaces matches with a specified string. - re.sub(): Replaces matches with a specified string.
* re.split(): Returns a list where the string has been split at each match. - re.split(): Returns a list where the string has been split at each match.
* re.escape(): Escapes special character - re.escape(): Escapes special character
Examples: Examples:
```bash ```bash
import re import re
@ -122,7 +157,7 @@ print(re.findall(r'\d+', 'abc123def456')) # Output: ['123', '456']
# Substitute matches # Substitute matches
print(re.sub(r'\d+', '#', 'abc123def456')) # Output: abc#def# print(re.sub(r'\d+', '#', 'abc123def456')) # Output: abc#def#
#Return a list where it get matched #Return a list where it get matched
print(re.split("\s", txt)) #['The', 'Donkey', 'in', 'the','Town'] print(re.split("\s", txt)) #['The', 'Donkey', 'in', 'the','Town']
# Escape special character # Escape special character
@ -130,9 +165,11 @@ print(re.escape("We are good to go")) #We\ are\ good\ to\ go
``` ```
## 4. Compiling Regular Expressions ## 4. Compiling Regular Expressions
Compiling regular expressions improves performance for repeated use. Compiling regular expressions improves performance for repeated use.
Example: Example:
```bash ```bash
import re import re
@ -144,9 +181,11 @@ print(pattern.findall('abc123def456')) # Output: ['123', '456']
``` ```
## 5. Groups and Capturing ## 5. Groups and Capturing
Parentheses () group and capture parts of the match. Parentheses () group and capture parts of the match.
Example: Example:
```bash ```bash
import re import re
@ -159,40 +198,46 @@ if match:
``` ```
## 6. Special Sequences ## 6. Special Sequences
Special sequences are shortcuts for common patterns: Special sequences are shortcuts for common patterns:
* \A:Returns a match if the specified characters are at the beginning of the string. - \A:Returns a match if the specified characters are at the beginning of the string.
* \b:Returns a match where the specified characters are at the beginning or at the end of a word. - \b:Returns a match where the specified characters are at the beginning or at the end of a word.
* \B:Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word. - \B:Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word.
* \d: Any digit. - \d: Any digit.
* \D: Any non-digit. - \D: Any non-digit.
* \w: Any alphanumeric character. - \w: Any alphanumeric character.
* \W: Any non-alphanumeric character. - \W: Any non-alphanumeric character.
* \s: Any whitespace character. - \s: Any whitespace character.
* \S: Any non-whitespace character. - \S: Any non-whitespace character.
* \Z:Returns a match if the specified characters are at the end of the string. - \Z:Returns a match if the specified characters are at the end of the string.
Example: Example:
```bash ```bash
import re import re
print(re.search(r'\w+@\w+\.\w+', 'Contact: support@example.com').group()) # Output: support@example.com print(re.search(r'\w+@\w+\.\w+', 'Contact: support@example.com').group()) # Output: support@example.com
``` ```
## 7.Sets ## 7.Sets
A set is a set of characters inside a pair of square brackets [] with a special meaning: A set is a set of characters inside a pair of square brackets [] with a special meaning:
* [arn] : Returns a match where one of the specified characters (a, r, or n) is present. - [arn] : Returns a match where one of the specified characters (a, r, or n) is present.
* [a-n] : Returns a match for any lower case character, alphabetically between a and n. - [a-n] : Returns a match for any lower case character, alphabetically between a and n.
* [^arn] : Returns a match for any character EXCEPT a, r, and n. - [^arn] : Returns a match for any character EXCEPT a, r, and n.
* [0123] : Returns a match where any of the specified digits (0, 1, 2, or 3) are present. - [0123] : Returns a match where any of the specified digits (0, 1, 2, or 3) are present.
* [0-9] : Returns a match for any digit between 0 and 9. - [0-9] : Returns a match for any digit between 0 and 9.
* [0-5][0-9] : Returns a match for any two-digit numbers from 00 and 59. - [0-5][0-9] : Returns a match for any two-digit numbers from 00 and 59.
* [a-zA-Z] : Returns a match for any character alphabetically between a and z, lower case OR upper case. - [a-zA-Z] : Returns a match for any character alphabetically between a and z, lower case OR upper case.
* [+] : In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string. - [+] : In sets, +, \*, ., |, (), $,{} has no special meaning
- [+] means: return a match for any + character in the string.
## Summary ## Summary
Regular expressions (regex) are a powerful tool for text processing in Python, offering a flexible way to match, search, and manipulate text patterns. The re module provides a comprehensive set of functions and metacharacters to tackle complex text processing tasks. Regular expressions (regex) are a powerful tool for text processing in Python, offering a flexible way to match, search, and manipulate text patterns. The re module provides a comprehensive set of functions and metacharacters to tackle complex text processing tasks.
With regex, you can: With regex, you can:
1.Match patterns: Use metacharacters like ., *, ?, and {} to match specific patterns in text. 1.Match patterns: Use metacharacters like ., \*, ?, and {} to match specific patterns in text.
2.Search text: Employ functions like re.search() and re.match() to find occurrences of patterns in text. 2.Search text: Employ functions like re.search() and re.match() to find occurrences of patterns in text.
3.Manipulate text: Utilize functions like re.sub() to replace patterns with new text. 3.Manipulate text: Utilize functions like re.sub() to replace patterns with new text.