Merge pull request #1300 from Damini2004/RegEX

RegEX Documentation added.
pull/1323/head^2
Ashita Prasad 2024-07-04 22:51:53 +05:30 zatwierdzone przez GitHub
commit ca663e1e20
Nie znaleziono w bazie danych klucza dla tego podpisu
ID klucza GPG: B5690EEEBB952194
1 zmienionych plików z 169 dodań i 25 usunięć

Wyświetl plik

@ -1,36 +1,144 @@
## Regular Expressions in Python ## Regular Expressions in Python
Regular expressions (regex) are a powerful tool for pattern matching and text manipulation. Regular expressions (regex) are a powerful tool for pattern matching and text manipulation.
Python's re module provides comprehensive support for regular expressions, enabling efficient text processing and validation. Python's re module provides comprehensive support for regular expressions, enabling efficient text processing and validation.
Regular expressions (regex) are a versitile tool for matching patterns in strings. In Python, the `re` module provides support for working with regular expressions.
## 1. Introduction to Regular Expressions ## 1. Introduction to Regular Expressions
A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting
specific patterns. specific patterns.
## 2. Basic Syntax ## 2. Basic Syntax
Literal Characters: Match exact characters (e.g., abc matches "abc"). Literal Characters: Match exact characters (e.g., abc matches "abc").
Metacharacters: Special characters like ., *, ?, +, ^, $, [ ], and | used to build patterns. Metacharacters: Special characters like ., \*, ?, +, ^, $, [ ], and | used to build patterns.
**Common Metacharacters:** **Common Metacharacters:**
* .: Any character except newline. - .: Any character except newline.
* ^: Start of the string. - ^: Start of the string.
* $: End of the string. - $: End of the string.
* *: 0 or more repetitions. - *: 0 or more repetitions.
* +: 1 or more repetitions. - +: 1 or more repetitions.
* ?: 0 or 1 repetition. - ?: 0 or 1 repetition.
* []: Any one character inside brackets (e.g., [a-z]). - []: Any one character inside brackets (e.g., [a-z]).
* |: Either the pattern before or after. - |: Either the pattern before or after.
- \ : Used to drop the special meaning of character following it
- {} : Indicate the number of occurrences of a preceding regex to match.
- () : Enclose a group of Regex
Examples:
1. `.`
```bash
import re
pattern = r'c.t'
text = 'cat cot cut cit'
matches = re.findall(pattern, text)
print(matches) # Output: ['cat', 'cot', 'cut', 'cit']
```
2. `^`
```bash
pattern = r'^Hello'
text = 'Hello, world!'
match = re.search(pattern, text)
print(match.group() if match else 'No match') # Output: 'Hello'
```
3. `$`
```bash
pattern = r'world!$'
text = 'Hello, world!'
match = re.search(pattern, text)
print(match.group() if match else 'No match') # Output: 'world!'
```
4. `*`
```bash
pattern = r'ab*'
text = 'a ab abb abbb'
matches = re.findall(pattern, text)
print(matches) # Output: ['a', 'ab', 'abb', 'abbb']
```
5. `+`
```bash
pattern = r'ab+'
text = 'a ab abb abbb'
matches = re.findall(pattern, text)
print(matches) # Output: ['ab', 'abb', 'abbb']
```
6. `?`
```bash
pattern = r'ab?'
text = 'a ab abb abbb'
matches = re.findall(pattern, text)
print(matches) # Output: ['a', 'ab', 'ab', 'ab']
```
7. `[]`
```bash
pattern = r'[aeiou]'
text = 'hello world'
matches = re.findall(pattern, text)
print(matches) # Output: ['e', 'o', 'o']
```
8. `|`
```bash
pattern = r'cat|dog'
text = 'I have a cat and a dog.'
matches = re.findall(pattern, text)
print(matches) # Output: ['cat', 'dog']
```
9. `\``
```bash
pattern = r'\$100'
text = 'The price is $100.'
match = re.search(pattern, text)
print(match.group() if match else 'No match') # Output: '$100'
```
10. `{}`
```bash
pattern = r'\d{3}'
text = 'My number is 123456'
matches = re.findall(pattern, text)
print(matches) # Output: ['123', '456']
```
11. `()`
```bash
pattern = r'(cat|dog)'
text = 'I have a cat and a dog.'
matches = re.findall(pattern, text)
print(matches) # Output: ['cat', 'dog']
```
## 3. Using the re Module ## 3. Using the re Module
**Key functions in the re module:** **Key functions in the re module:**
* re.match(): Checks for a match at the beginning of the string. - re.match(): Checks for a match at the beginning of the string.
* re.search(): Searches for a match anywhere in the string. - re.search(): Searches for a match anywhere in the string.
* re.findall(): Returns a list of all matches. - re.findall(): Returns a list of all matches.
* re.sub(): Replaces matches with a specified string. - re.sub(): Replaces matches with a specified string.
- re.split(): Returns a list where the string has been split at each match.
- re.escape(): Escapes special character
Examples:
Examples:
```bash ```bash
import re import re
@ -45,12 +153,20 @@ print(re.findall(r'\d+', 'abc123def456')) # Output: ['123', '456']
# Substitute matches # Substitute matches
print(re.sub(r'\d+', '#', 'abc123def456')) # Output: abc#def# print(re.sub(r'\d+', '#', 'abc123def456')) # Output: abc#def#
#Return a list where it get matched
print(re.split("\s", txt)) #['The', 'Donkey', 'in', 'the','Town']
# Escape special character
print(re.escape("We are good to go")) #We\ are\ good\ to\ go
``` ```
## 4. Compiling Regular Expressions ## 4. Compiling Regular Expressions
Compiling regular expressions improves performance for repeated use. Compiling regular expressions improves performance for repeated use.
Example: Example:
```bash ```bash
import re import re
@ -58,12 +174,15 @@ pattern = re.compile(r'\d+')
print(pattern.match('123abc').group()) # Output: 123 print(pattern.match('123abc').group()) # Output: 123
print(pattern.search('abc123').group()) # Output: 123 print(pattern.search('abc123').group()) # Output: 123
print(pattern.findall('abc123def456')) # Output: ['123', '456'] print(pattern.findall('abc123def456')) # Output: ['123', '456']
``` ```
## 5. Groups and Capturing ## 5. Groups and Capturing
Parentheses () group and capture parts of the match. Parentheses () group and capture parts of the match.
Example: Example:
```bash ```bash
import re import re
@ -76,21 +195,46 @@ if match:
``` ```
## 6. Special Sequences ## 6. Special Sequences
Special sequences are shortcuts for common patterns: Special sequences are shortcuts for common patterns:
* \d: Any digit. - \A:Returns a match if the specified characters are at the beginning of the string.
* \D: Any non-digit. - \b:Returns a match where the specified characters are at the beginning or at the end of a word.
* \w: Any alphanumeric character. - \B:Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word.
* \W: Any non-alphanumeric character. - \d: Any digit.
* \s: Any whitespace character. - \D: Any non-digit.
* \S: Any non-whitespace character. - \w: Any alphanumeric character.
- \W: Any non-alphanumeric character.
- \s: Any whitespace character.
- \S: Any non-whitespace character.
- \Z:Returns a match if the specified characters are at the end of the string.
Example: Example:
```bash ```bash
import re import re
print(re.search(r'\w+@\w+\.\w+', 'Contact: support@example.com').group()) # Output: support@example.com print(re.search(r'\w+@\w+\.\w+', 'Contact: support@example.com').group()) # Output: support@example.com
``` ```
## 7.Sets
A set is a set of characters inside a pair of square brackets [] with a special meaning:
- [arn] : Returns a match where one of the specified characters (a, r, or n) is present.
- [a-n] : Returns a match for any lower case character, alphabetically between a and n.
- [^arn] : Returns a match for any character EXCEPT a, r, and n.
- [0123] : Returns a match where any of the specified digits (0, 1, 2, or 3) are present.
- [0-9] : Returns a match for any digit between 0 and 9.
- [0-5][0-9] : Returns a match for any two-digit numbers from 00 and 59.
- [a-zA-Z] : Returns a match for any character alphabetically between a and z, lower case OR upper case.
- [+] : In sets, +, \*, ., |, (), $,{} has no special meaning
- [+] means: return a match for any + character in the string.
## Summary ## Summary
Regular expressions are a versatile tool for text processing in Python. The re module offers powerful functions and metacharacters for pattern matching,
searching, and manipulation, making it an essential skill for handling complex text processing tasks. Regular expressions (regex) are a powerful tool for text processing in Python, offering a flexible way to match, search, and manipulate text patterns. The re module provides a comprehensive set of functions and metacharacters to tackle complex text processing tasks.
With regex, you can:
1.Match patterns: Use metacharacters like ., \*, ?, and {} to match specific patterns in text.
2.Search text: Employ functions like re.search() and re.match() to find occurrences of patterns in text.
3.Manipulate text: Utilize functions like re.sub() to replace patterns with new text.