Merge pull request #1300 from Damini2004/RegEX

RegEX Documentation added.
2024-07-04 22:51:53 +05:30 · 2024-07-04 22:51:53 +05:30 · ca663e1e20
commit ca663e1e20
--- a/contrib/advanced-python/regular_expressions.md
+++ b/contrib/advanced-python/regular_expressions.md
@ -1,36 +1,144 @@
 ## Regular Expressions in Python
-Regular expressions (regex) are a powerful tool for pattern matching and text manipulation. 
+Regular expressions (regex) are a powerful tool for pattern matching and text manipulation.
 Python's re module provides comprehensive support for regular expressions, enabling efficient text processing and validation.
+Regular expressions (regex) are a versitile tool for matching patterns in strings. In Python, the `re` module provides support for working with regular expressions.

 ## 1. Introduction to Regular Expressions
-A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting 
+A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting
 specific patterns.

 ## 2. Basic Syntax
 Literal Characters: Match exact characters (e.g., abc matches "abc").
-Metacharacters: Special characters like ., *, ?, +, ^, $, [ ], and | used to build patterns.
+Metacharacters: Special characters like ., \*, ?, +, ^, $, [ ], and | used to build patterns.

 **Common Metacharacters:**

-* .: Any character except newline.
-* ^: Start of the string.
-* $: End of the string.
-* *: 0 or more repetitions.
-* +: 1 or more repetitions.
-* ?: 0 or 1 repetition.
-* []: Any one character inside brackets (e.g., [a-z]).
-* |: Either the pattern before or after.
-  
+- .: Any character except newline.
+- ^: Start of the string.
+- $: End of the string.
+- *: 0 or more repetitions.
+- +: 1 or more repetitions.
+- ?: 0 or 1 repetition.
+- []: Any one character inside brackets (e.g., [a-z]).
+- |: Either the pattern before or after.
+- \ : Used to drop the special meaning of character following it
+- {} : Indicate the number of occurrences of a preceding regex to match.
+- () : Enclose a group of Regex
+
+Examples:
+
+1. `.`
+
+```bash
+import re
+pattern = r'c.t'
+text = 'cat cot cut cit'
+matches = re.findall(pattern, text)
+print(matches)  # Output: ['cat', 'cot', 'cut', 'cit']
+```
+
+2. `^`
+
+```bash
+pattern = r'^Hello'
+text = 'Hello, world!'
+match = re.search(pattern, text)
+print(match.group() if match else 'No match')  # Output: 'Hello'
+```
+
+3. `$`
+
+```bash
+pattern = r'world!$'
+text = 'Hello, world!'
+match = re.search(pattern, text)
+print(match.group() if match else 'No match')  # Output: 'world!'
+```
+
+4. `*`
+
+```bash
+pattern = r'ab*'
+text = 'a ab abb abbb'
+matches = re.findall(pattern, text)
+print(matches)  # Output: ['a', 'ab', 'abb', 'abbb']
+```
+
+5. `+`
+
+```bash
+pattern = r'ab+'
+text = 'a ab abb abbb'
+matches = re.findall(pattern, text)
+print(matches)  # Output: ['ab', 'abb', 'abbb']
+```
+
+6. `?`
+
+```bash
+pattern = r'ab?'
+text = 'a ab abb abbb'
+matches = re.findall(pattern, text)
+print(matches)  # Output: ['a', 'ab', 'ab', 'ab']
+```
+
+7. `[]`
+
+```bash
+pattern = r'[aeiou]'
+text = 'hello world'
+matches = re.findall(pattern, text)
+print(matches)  # Output: ['e', 'o', 'o']
+```
+
+8. `|`
+
+```bash
+pattern = r'cat|dog'
+text = 'I have a cat and a dog.'
+matches = re.findall(pattern, text)
+print(matches)  # Output: ['cat', 'dog']
+```
+
+9. `\``
+
+```bash
+pattern = r'\$100'
+text = 'The price is $100.'
+match = re.search(pattern, text)
+print(match.group() if match else 'No match')  # Output: '$100'
+```
+
+10. `{}`
+
+```bash
+pattern = r'\d{3}'
+text = 'My number is 123456'
+matches = re.findall(pattern, text)
+print(matches)  # Output: ['123', '456']
+```
+
+11. `()`
+
+```bash
+pattern = r'(cat|dog)'
+text = 'I have a cat and a dog.'
+matches = re.findall(pattern, text)
+print(matches)  # Output: ['cat', 'dog']
+```
+
 ## 3. Using the re Module

 **Key functions in the re module:**

-* re.match(): Checks for a match at the beginning of the string.
-* re.search(): Searches for a match anywhere in the string.
-* re.findall(): Returns a list of all matches.
-* re.sub(): Replaces matches with a specified string.
+- re.match(): Checks for a match at the beginning of the string.
+- re.search(): Searches for a match anywhere in the string.
+- re.findall(): Returns a list of all matches.
+- re.sub(): Replaces matches with a specified string.
+- re.split(): Returns a list where the string has been split at each match.
+- re.escape(): Escapes special character
+  Examples:

-Examples:
 ```bash
 import re

@ -45,12 +153,20 @@ print(re.findall(r'\d+', 'abc123def456'))  # Output: ['123', '456']

 # Substitute matches
 print(re.sub(r'\d+', '#', 'abc123def456'))  # Output: abc#def#
+
+#Return a list where it get matched
+print(re.split("\s", txt)) #['The', 'Donkey', 'in', 'the','Town']
+
+# Escape special character
+print(re.escape("We are good to go"))  #We\ are\ good\ to\ go
 ```

 ## 4. Compiling Regular Expressions
+
 Compiling regular expressions improves performance for repeated use.

 Example:
+
 ```bash
 import re

@ -58,12 +174,15 @@ pattern = re.compile(r'\d+')
 print(pattern.match('123abc').group())  # Output: 123
 print(pattern.search('abc123').group())  # Output: 123
 print(pattern.findall('abc123def456'))  # Output: ['123', '456']
+
 ```

 ## 5. Groups and Capturing
+
 Parentheses () group and capture parts of the match.

 Example:
+
 ```bash
 import re

@ -76,21 +195,46 @@ if match:
 ```

 ## 6. Special Sequences
+
 Special sequences are shortcuts for common patterns:

-* \d: Any digit.
-* \D: Any non-digit.
-* \w: Any alphanumeric character.
-* \W: Any non-alphanumeric character.
-* \s: Any whitespace character.
-* \S: Any non-whitespace character.
+- \A:Returns a match if the specified characters are at the beginning of the string.
+- \b:Returns a match where the specified characters are at the beginning or at the end of a word.
+- \B:Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word.
+- \d: Any digit.
+- \D: Any non-digit.
+- \w: Any alphanumeric character.
+- \W: Any non-alphanumeric character.
+- \s: Any whitespace character.
+- \S: Any non-whitespace character.
+- \Z:Returns a match if the specified characters are at the end of the string.
+
 Example:
+
 ```bash
 import re

 print(re.search(r'\w+@\w+\.\w+', 'Contact: support@example.com').group())  # Output: support@example.com
 ```

+## 7.Sets
+
+A set is a set of characters inside a pair of square brackets [] with a special meaning:
+
+- [arn] : Returns a match where one of the specified characters (a, r, or n) is present.
+- [a-n] : Returns a match for any lower case character, alphabetically between a and n.
+- [^arn] : Returns a match for any character EXCEPT a, r, and n.
+- [0123] : Returns a match where any of the specified digits (0, 1, 2, or 3) are present.
+- [0-9] : Returns a match for any digit between 0 and 9.
+- [0-5][0-9] : Returns a match for any two-digit numbers from 00 and 59.
+- [a-zA-Z] : Returns a match for any character alphabetically between a and z, lower case OR upper case.
+- [+] : In sets, +, \*, ., |, (), $,{} has no special meaning
+- [+] means: return a match for any + character in the string.
+
 ## Summary
-Regular expressions are a versatile tool for text processing in Python. The re module offers powerful functions and metacharacters for pattern matching, 
-searching, and manipulation, making it an essential skill for handling complex text processing tasks.
+
+Regular expressions (regex) are a powerful tool for text processing in Python, offering a flexible way to match, search, and manipulate text patterns. The re module provides a comprehensive set of functions and metacharacters to tackle complex text processing tasks.
+With regex, you can:
+1.Match patterns: Use metacharacters like ., \*, ?, and {} to match specific patterns in text.
+2.Search text: Employ functions like re.search() and re.match() to find occurrences of patterns in text.
+3.Manipulate text: Utilize functions like re.sub() to replace patterns with new text.