EnggQasim · SyedMeasum5 · Aug 11, 2024
diff --git a/04-Regex(docker compose)/RegEx/readme.md b/04-Regex(docker compose)/RegEx/readme.md
@@ -0,0 +1,305 @@
+# What is RegEx?
+Regular expressions, commonly known as RegEx or regexp, are sequences of characters that define search patterns. These patterns are used for string-searching algorithms, "find" or "find and replace" operations on strings, or for input validation³.
+
+
+### Practical Applications:
+- **Form Input Validation**: Ensuring email addresses, phone numbers, and other inputs are in the correct format.
+- **Web Scraping**: Extracting specific information from web pages.
+- **Search and Replace**: Finding and replacing text in documents or code.
+- **Log Analysis**: Extracting useful information from log files.
+
+### Example:
+To match an email address, you might use a pattern like:
+```regex
+[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
+```
+
+This pattern matches typical email addresses like `[email protected]`.
+
+RegEx can seem complex at first, but with practice, it becomes a powerful tool for text processing and data extraction¹².
+
+
+¹: [Learn Regex: A Beginner's Guide — SitePoint](https://www.sitepoint.com/learn-regex/)
+
+²: [RegexOne - Learn Regular Expressions](https://www.regexone.com/)
+
+³: [Regular expression - Wikipedia](https://en.wikipedia.org/wiki/Regular_expression)
+
+(1) Regular expression - Wikipedia. https://en.wikipedia.org/wiki/Regular_expression.
+
+(2) Learn Regex: A Beginner's Guide — SitePoint. https://www.sitepoint.com/learn-regex/.
+
+(3) RegexOne - Learn Regular Expressions - Lesson 1: An Introduction, and .... https://www.regexone.com/.
+
+(4) en.wikipedia.org. https://en.wikipedia.org/wiki/Regular_expression.
+
+
+# RegEx Charachters
+
+### 1. **Literal Characters**
+- **Pattern**: `hello`
+- **Matches**: `hello` in "hello world"
+- **Explanation**: Matches the exact sequence of characters.
+
+### 2. **Dot (.)**
+- **Pattern**: `h.llo`
+- **Matches**: `hello`, `hallo`, `hxllo`
+- **Explanation**: Matches any single character except newline.
+
+### 3. **Caret (^)**
+- **Pattern**: `^hello`
+- **Matches**: `hello` at the start of a string
+- **Explanation**: Asserts position at the start of a line.
+
+### 4. **Dollar ($)**
+- **Pattern**: `world$`
+- **Matches**: `world` at the end of a string
+- **Explanation**: Asserts position at the end of a line.
+
+### 5. **Asterisk (*)**
+- **Pattern**: `he*llo`
+- **Matches**: `hllo`, `hello`, `heello`
+- **Explanation**: Matches 0 or more of the preceding element.
+
+### 6. **Plus (+)**
+- **Pattern**: `he+llo`
+- **Matches**: `hello`, `heello`
+- **Explanation**: Matches 1 or more of the preceding element.
+
+### 7. **Question Mark (?)**
+- **Pattern**: `he?llo`
+- **Matches**: `hllo`, `hello`
+- **Explanation**: Matches 0 or 1 of the preceding element.
+
+### 8. **Braces ({})**
+- **Pattern**: `he{2}llo`
+- **Matches**: `heello`
+- **Explanation**: Matches exactly `n` occurrences of the preceding element.
+
+### 9. **Square Brackets ([])**
+- **Pattern**: `[aeiou]`
+- **Matches**: `a`, `e`, `i`, `o`, `u`
+- **Explanation**: Matches any one of the enclosed characters.
+
+### 10. **Negated Character Class ([^])**
+- **Pattern**: `[^aeiou]`
+- **Matches**: Any character except `a`, `e`, `i`, `o`, `u`
+- **Explanation**: Matches any character not in the brackets.
+
+### 11. **Parentheses (())**
+- **Pattern**: `(hello)`
+- **Matches**: `hello`
+- **Explanation**: Groups multiple tokens together.
+
+### 12. **Pipe (|)**
+- **Pattern**: `hello|world`
+- **Matches**: `hello`, `world`
+- **Explanation**: Matches either the pattern before or after the `|`.
+
+### 13. **Backslash (\)**
+- **Pattern**: `\d`
+- **Matches**: Any digit (0-9)
+- **Explanation**: Escapes a special character or denotes a special sequence.
+
+### 14. **Special Sequences**
+- **\d**: Matches any digit (0-9)
+  - **Pattern**: `\d`
+  - **Matches**: `1`, `2`, `3`, etc.
+- **\D**: Matches any non-digit
+  - **Pattern**: `\D`
+  - **Matches**: `a`, `b`, `!`, etc.
+- **\w**: Matches any word character (alphanumeric + underscore)
+  - **Pattern**: `\w`
+  - **Matches**: `a`, `1`, `_`, etc.
+- **\W**: Matches any non-word character
+  - **Pattern**: `\W`
+  - **Matches**: `!`, `@`, `#`, etc.
+- **\s**: Matches any whitespace character
+  - **Pattern**: `\s`
+  - **Matches**: ` `, `\t`, `\n`, etc.
+- **\S**: Matches any non-whitespace character
+  - **Pattern**: `\S`
+  - **Matches**: `a`, `1`, `!`, etc.
+
+### Examples in Context
+Let's use these characters in a more complex pattern:
+
+**Pattern**: `^\d{3}-\d{2}-\d{4}$`
+- **Matches**: `123-45-6789`
+- **Explanation**: Matches a string that starts (`^`) with exactly three digits (`\d{3}`), followed by a hyphen, two digits (`\d{2}`), another hyphen, and four digits (`\d{4}`), and ends (`$`).
+
+(1) Regular expression syntax cheat sheet - JavaScript | MDN. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Cheatsheet.
+(2) A Practical Guide to Regular Expressions – Learn RegEx with Real Life .... https://www.freecodecamp.org/news/practical-regex-guide-with-real-life-examples/.
+(3) Regex Tutorial - A Cheatsheet with Examples - Regextutorial.org. https://regextutorial.org/.
+
+
+# Best practices for RegEx
+
+### 1. **Keep It Simple**
+- **Explanation**: Write the simplest pattern that accomplishes your goal. Complex patterns are harder to read and maintain.
+- **Example**: Instead of `\b\d{1,3}(,\d{3})*(\.\d{2})?\b` to match a number with commas and decimals, use `\d+` if you only need to match digits.
+
+### 2. **Use Comments and Verbose Mode**
+- **Explanation**: Add comments and use the verbose mode to make your RegEx more readable.
+- **Example**:
+  ```python
+  pattern = re.compile(r"""
+      \b          # Word boundary
+      \d{1,3}     # 1 to 3 digits
+      (,\d{3})*   # Optional groups of 3 digits, separated by commas
+      (\.\d{2})?  # Optional decimal point followed by 2 digits
+      \b          # Word boundary
+  """, re.VERBOSE)
+  ```
+
+### 3. **Test Thoroughly**
+- **Explanation**: Test your RegEx with various inputs to ensure it works as expected.
+- **Example**: Use tools like [regex101](https://regex101.com/) to test and debug your patterns.
+
+### 4. **Avoid Overusing Wildcards**
+- **Explanation**: The dot (`.`) matches any character except newline, which can lead to unexpected matches.
+- **Example**: Instead of `.*`, use a more specific pattern like `\w+` to match word characters.
+
+### 5. **Use Non-Capturing Groups When Possible**
+- **Explanation**: Use `(?:...)` for groups you don't need to capture, which can improve performance.
+- **Example**: Instead of `(abc|def)`, use `(?:abc|def)` if you don't need to capture the group.
+
+### 6. **Anchor Your Patterns**
+- **Explanation**: Use `^` and `$` to match the start and end of a string, respectively, to avoid partial matches.
+- **Example**: Use `^hello$` to match the exact string "hello".
+
+### 7. **Escape Special Characters**
+- **Explanation**: Use backslashes (`\`) to escape special characters when you need to match them literally.
+- **Example**: Use `\.` to match a period instead of `.`.
+
+### 8. **Use Character Classes**
+- **Explanation**: Use `[ ]` to define a set of characters to match, which can simplify your pattern.
+- **Example**: Use `[aeiou]` to match any vowel.
+
+### 9. **Optimize for Performance**
+- **Explanation**: Avoid patterns that can cause excessive backtracking, which can slow down matching.
+- **Example**: Use `\d{4}` instead of `\d\d\d\d` to match four digits.
+
+### 10. **Document Your Patterns**
+- **Explanation**: Add comments or documentation to explain complex patterns for future reference.
+- **Example**: Include a comment like `# Matches a US phone number` above your pattern.
+
+By following these best practices, you can write more efficient, readable, and maintainable regular expressions. 
+
+(1) Regex Learn - Step by step, from zero to advanced.. https://regexlearn.com/.
+(2) A Practical Guide to Regular Expressions – Learn RegEx with Real Life .... https://www.freecodecamp.org/news/practical-regex-guide-with-real-life-examples/.
+(3) Best Practices for Regular Expressions in .NET - .NET. https://learn.microsoft.com/en-us/dotnet/standard/base-types/best-practices-regex.
+
+
+
+
+# SOME REAL LIFE APPLICATION OF REGEX
+
+### 1. **Extracting Email Addresses from Text**
+**Problem**: You have a large text document containing various information, and you need to extract all email addresses.
+
+**RegEx Pattern**: `[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`
+
+**Example**:
+```python
+import re
+
+text = "Contact us at [email protected] or [email protected] for more information."
+emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
+print(emails)
+```
+
+**Output**:
+```
+['[email protected]', '[email protected]']
+```
+
+### 2. **Extracting Dates from a Document**
+**Problem**: You need to extract all dates in the format `MM/DD/YYYY` from a text document.
+
+**RegEx Pattern**: `\b\d{2}/\d{2}/\d{4}\b`
+
+**Example**:
+```python
+import re
+
+text = "The event is scheduled for 12/25/2024 and the deadline for registration is 11/30/2024."
+dates = re.findall(r'\b\d{2}/\d{2}/\d{4}\b', text)
+print(dates)
+```
+
+**Output**:
+```
+['12/25/2024', '11/30/2024']
+```
+
+### 3. **Extracting Phone Numbers from Text**
+**Problem**: You have a text file with various contact information, and you need to extract all phone numbers in the format `(XXX) XXX-XXXX`.
+
+**RegEx Pattern**: `\(\d{3}\) \d{3}-\d{4}`
+
+**Example**:
+```python
+import re
+
+text = "You can reach us at (555) 123-4567 or (555) 987-6543."
+phone_numbers = re.findall(r'\(\d{3}\) \d{3}-\d{4}', text)
+print(phone_numbers)
+```
+
+**Output**:
+```
+['(555) 123-4567', '(555) 987-6543']
+```
+
+### 4. **Extracting URLs from a Web Page**
+**Problem**: You need to extract all URLs from the HTML content of a web page.
+
+**RegEx Pattern**: `https?://[^\s]+`
+
+**Example**:
+```python
+import re
+
+html_content = """
+<a href="https://example.com">Example</a>
+<a href="http://example.org">Example Org</a>
+"""
+urls = re.findall(r'https?://[^\s]+', html_content)
+print(urls)
+```
+
+**Output**:
+```
+['https://example.com', 'http://example.org']
+```
+
+### 5. **Extracting IP Addresses from Log Files**
+**Problem**: You have server log files, and you need to extract all IP addresses.
+
+**RegEx Pattern**: `\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b`
+
+**Example**:
+```python
+import re
+
+log_data = """
+192.168.1.1 - - [10/Oct/2024:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326
+172.16.0.2 - - [10/Oct/2024:13:55:36 -0700] "POST /form HTTP/1.1" 404 721
+"""
+ip_addresses = re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', log_data)
+print(ip_addresses)
+```
+
+**Output**:
+```
+['192.168.1.1', '172.16.0.2']
+```
+
+These examples demonstrate how RegEx can be used to extract specific types of data from raw text, making it a powerful tool for data processing and analysis.
+
+(1) Five Practical Use Cases for Regular Expressions. https://blog.openreplay.com/five-practical-use-cases-for-regular-expressions/.
+
+(2) A Practical Guide to Regular Expressions – Learn RegEx with Real Life .... https://www.freecodecamp.org/news/practical-regex-guide-with-real-life-examples/.
+
+(3) What have you used Regular Expressions for? - Stack Overflow. https://stackoverflow.com/questions/404430/what-have-you-used-regular-expressions-for.