From 2dbc0146a1a3f9bba63f7b30e9a449440a71b2fc Mon Sep 17 00:00:00 2001 From: SyedMeasum5 Date: Sun, 11 Aug 2024 21:43:06 +0500 Subject: [PATCH] Create readme.md --- 04-Regex(docker compose)/RegEx/readme.md | 305 +++++++++++++++++++++++ 1 file changed, 305 insertions(+) create mode 100644 04-Regex(docker compose)/RegEx/readme.md diff --git a/04-Regex(docker compose)/RegEx/readme.md b/04-Regex(docker compose)/RegEx/readme.md new file mode 100644 index 0000000..328a955 --- /dev/null +++ b/04-Regex(docker compose)/RegEx/readme.md @@ -0,0 +1,305 @@ +# What is RegEx? +Regular expressions, commonly known as RegEx or regexp, are sequences of characters that define search patterns. These patterns are used for string-searching algorithms, "find" or "find and replace" operations on strings, or for input validation³. + + +### Practical Applications: +- **Form Input Validation**: Ensuring email addresses, phone numbers, and other inputs are in the correct format. +- **Web Scraping**: Extracting specific information from web pages. +- **Search and Replace**: Finding and replacing text in documents or code. +- **Log Analysis**: Extracting useful information from log files. + +### Example: +To match an email address, you might use a pattern like: +```regex +[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} +``` + +This pattern matches typical email addresses like `example@example.com`. + +RegEx can seem complex at first, but with practice, it becomes a powerful tool for text processing and data extraction¹². + + +¹: [Learn Regex: A Beginner's Guide — SitePoint](https://www.sitepoint.com/learn-regex/) + +²: [RegexOne - Learn Regular Expressions](https://www.regexone.com/) + +³: [Regular expression - Wikipedia](https://en.wikipedia.org/wiki/Regular_expression) + +(1) Regular expression - Wikipedia. https://en.wikipedia.org/wiki/Regular_expression. + +(2) Learn Regex: A Beginner's Guide — SitePoint. https://www.sitepoint.com/learn-regex/. + +(3) RegexOne - Learn Regular Expressions - Lesson 1: An Introduction, and .... https://www.regexone.com/. + +(4) en.wikipedia.org. https://en.wikipedia.org/wiki/Regular_expression. + + +# RegEx Charachters + +### 1. **Literal Characters** +- **Pattern**: `hello` +- **Matches**: `hello` in "hello world" +- **Explanation**: Matches the exact sequence of characters. + +### 2. **Dot (.)** +- **Pattern**: `h.llo` +- **Matches**: `hello`, `hallo`, `hxllo` +- **Explanation**: Matches any single character except newline. + +### 3. **Caret (^)** +- **Pattern**: `^hello` +- **Matches**: `hello` at the start of a string +- **Explanation**: Asserts position at the start of a line. + +### 4. **Dollar ($)** +- **Pattern**: `world$` +- **Matches**: `world` at the end of a string +- **Explanation**: Asserts position at the end of a line. + +### 5. **Asterisk (*)** +- **Pattern**: `he*llo` +- **Matches**: `hllo`, `hello`, `heello` +- **Explanation**: Matches 0 or more of the preceding element. + +### 6. **Plus (+)** +- **Pattern**: `he+llo` +- **Matches**: `hello`, `heello` +- **Explanation**: Matches 1 or more of the preceding element. + +### 7. **Question Mark (?)** +- **Pattern**: `he?llo` +- **Matches**: `hllo`, `hello` +- **Explanation**: Matches 0 or 1 of the preceding element. + +### 8. **Braces ({})** +- **Pattern**: `he{2}llo` +- **Matches**: `heello` +- **Explanation**: Matches exactly `n` occurrences of the preceding element. + +### 9. **Square Brackets ([])** +- **Pattern**: `[aeiou]` +- **Matches**: `a`, `e`, `i`, `o`, `u` +- **Explanation**: Matches any one of the enclosed characters. + +### 10. **Negated Character Class ([^])** +- **Pattern**: `[^aeiou]` +- **Matches**: Any character except `a`, `e`, `i`, `o`, `u` +- **Explanation**: Matches any character not in the brackets. + +### 11. **Parentheses (())** +- **Pattern**: `(hello)` +- **Matches**: `hello` +- **Explanation**: Groups multiple tokens together. + +### 12. **Pipe (|)** +- **Pattern**: `hello|world` +- **Matches**: `hello`, `world` +- **Explanation**: Matches either the pattern before or after the `|`. + +### 13. **Backslash (\)** +- **Pattern**: `\d` +- **Matches**: Any digit (0-9) +- **Explanation**: Escapes a special character or denotes a special sequence. + +### 14. **Special Sequences** +- **\d**: Matches any digit (0-9) + - **Pattern**: `\d` + - **Matches**: `1`, `2`, `3`, etc. +- **\D**: Matches any non-digit + - **Pattern**: `\D` + - **Matches**: `a`, `b`, `!`, etc. +- **\w**: Matches any word character (alphanumeric + underscore) + - **Pattern**: `\w` + - **Matches**: `a`, `1`, `_`, etc. +- **\W**: Matches any non-word character + - **Pattern**: `\W` + - **Matches**: `!`, `@`, `#`, etc. +- **\s**: Matches any whitespace character + - **Pattern**: `\s` + - **Matches**: ` `, `\t`, `\n`, etc. +- **\S**: Matches any non-whitespace character + - **Pattern**: `\S` + - **Matches**: `a`, `1`, `!`, etc. + +### Examples in Context +Let's use these characters in a more complex pattern: + +**Pattern**: `^\d{3}-\d{2}-\d{4}$` +- **Matches**: `123-45-6789` +- **Explanation**: Matches a string that starts (`^`) with exactly three digits (`\d{3}`), followed by a hyphen, two digits (`\d{2}`), another hyphen, and four digits (`\d{4}`), and ends (`$`). + +(1) Regular expression syntax cheat sheet - JavaScript | MDN. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Cheatsheet. +(2) A Practical Guide to Regular Expressions – Learn RegEx with Real Life .... https://www.freecodecamp.org/news/practical-regex-guide-with-real-life-examples/. +(3) Regex Tutorial - A Cheatsheet with Examples - Regextutorial.org. https://regextutorial.org/. + + +# Best practices for RegEx + +### 1. **Keep It Simple** +- **Explanation**: Write the simplest pattern that accomplishes your goal. Complex patterns are harder to read and maintain. +- **Example**: Instead of `\b\d{1,3}(,\d{3})*(\.\d{2})?\b` to match a number with commas and decimals, use `\d+` if you only need to match digits. + +### 2. **Use Comments and Verbose Mode** +- **Explanation**: Add comments and use the verbose mode to make your RegEx more readable. +- **Example**: + ```python + pattern = re.compile(r""" + \b # Word boundary + \d{1,3} # 1 to 3 digits + (,\d{3})* # Optional groups of 3 digits, separated by commas + (\.\d{2})? # Optional decimal point followed by 2 digits + \b # Word boundary + """, re.VERBOSE) + ``` + +### 3. **Test Thoroughly** +- **Explanation**: Test your RegEx with various inputs to ensure it works as expected. +- **Example**: Use tools like [regex101](https://regex101.com/) to test and debug your patterns. + +### 4. **Avoid Overusing Wildcards** +- **Explanation**: The dot (`.`) matches any character except newline, which can lead to unexpected matches. +- **Example**: Instead of `.*`, use a more specific pattern like `\w+` to match word characters. + +### 5. **Use Non-Capturing Groups When Possible** +- **Explanation**: Use `(?:...)` for groups you don't need to capture, which can improve performance. +- **Example**: Instead of `(abc|def)`, use `(?:abc|def)` if you don't need to capture the group. + +### 6. **Anchor Your Patterns** +- **Explanation**: Use `^` and `$` to match the start and end of a string, respectively, to avoid partial matches. +- **Example**: Use `^hello$` to match the exact string "hello". + +### 7. **Escape Special Characters** +- **Explanation**: Use backslashes (`\`) to escape special characters when you need to match them literally. +- **Example**: Use `\.` to match a period instead of `.`. + +### 8. **Use Character Classes** +- **Explanation**: Use `[ ]` to define a set of characters to match, which can simplify your pattern. +- **Example**: Use `[aeiou]` to match any vowel. + +### 9. **Optimize for Performance** +- **Explanation**: Avoid patterns that can cause excessive backtracking, which can slow down matching. +- **Example**: Use `\d{4}` instead of `\d\d\d\d` to match four digits. + +### 10. **Document Your Patterns** +- **Explanation**: Add comments or documentation to explain complex patterns for future reference. +- **Example**: Include a comment like `# Matches a US phone number` above your pattern. + +By following these best practices, you can write more efficient, readable, and maintainable regular expressions. + +(1) Regex Learn - Step by step, from zero to advanced.. https://regexlearn.com/. +(2) A Practical Guide to Regular Expressions – Learn RegEx with Real Life .... https://www.freecodecamp.org/news/practical-regex-guide-with-real-life-examples/. +(3) Best Practices for Regular Expressions in .NET - .NET. https://learn.microsoft.com/en-us/dotnet/standard/base-types/best-practices-regex. + + + + +# SOME REAL LIFE APPLICATION OF REGEX + +### 1. **Extracting Email Addresses from Text** +**Problem**: You have a large text document containing various information, and you need to extract all email addresses. + +**RegEx Pattern**: `[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}` + +**Example**: +```python +import re + +text = "Contact us at support@example.com or sales@example.org for more information." +emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text) +print(emails) +``` + +**Output**: +``` +['support@example.com', 'sales@example.org'] +``` + +### 2. **Extracting Dates from a Document** +**Problem**: You need to extract all dates in the format `MM/DD/YYYY` from a text document. + +**RegEx Pattern**: `\b\d{2}/\d{2}/\d{4}\b` + +**Example**: +```python +import re + +text = "The event is scheduled for 12/25/2024 and the deadline for registration is 11/30/2024." +dates = re.findall(r'\b\d{2}/\d{2}/\d{4}\b', text) +print(dates) +``` + +**Output**: +``` +['12/25/2024', '11/30/2024'] +``` + +### 3. **Extracting Phone Numbers from Text** +**Problem**: You have a text file with various contact information, and you need to extract all phone numbers in the format `(XXX) XXX-XXXX`. + +**RegEx Pattern**: `\(\d{3}\) \d{3}-\d{4}` + +**Example**: +```python +import re + +text = "You can reach us at (555) 123-4567 or (555) 987-6543." +phone_numbers = re.findall(r'\(\d{3}\) \d{3}-\d{4}', text) +print(phone_numbers) +``` + +**Output**: +``` +['(555) 123-4567', '(555) 987-6543'] +``` + +### 4. **Extracting URLs from a Web Page** +**Problem**: You need to extract all URLs from the HTML content of a web page. + +**RegEx Pattern**: `https?://[^\s]+` + +**Example**: +```python +import re + +html_content = """ +Example +Example Org +""" +urls = re.findall(r'https?://[^\s]+', html_content) +print(urls) +``` + +**Output**: +``` +['https://example.com', 'http://example.org'] +``` + +### 5. **Extracting IP Addresses from Log Files** +**Problem**: You have server log files, and you need to extract all IP addresses. + +**RegEx Pattern**: `\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b` + +**Example**: +```python +import re + +log_data = """ +192.168.1.1 - - [10/Oct/2024:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326 +172.16.0.2 - - [10/Oct/2024:13:55:36 -0700] "POST /form HTTP/1.1" 404 721 +""" +ip_addresses = re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', log_data) +print(ip_addresses) +``` + +**Output**: +``` +['192.168.1.1', '172.16.0.2'] +``` + +These examples demonstrate how RegEx can be used to extract specific types of data from raw text, making it a powerful tool for data processing and analysis. + +(1) Five Practical Use Cases for Regular Expressions. https://blog.openreplay.com/five-practical-use-cases-for-regular-expressions/. + +(2) A Practical Guide to Regular Expressions – Learn RegEx with Real Life .... https://www.freecodecamp.org/news/practical-regex-guide-with-real-life-examples/. + +(3) What have you used Regular Expressions for? - Stack Overflow. https://stackoverflow.com/questions/404430/what-have-you-used-regular-expressions-for. \ No newline at end of file