Python re.match() and re.sub() Explained

Q: What is re.match() in Python?

re.match() is a function in Python that searches for a match only at the beginning of the string. If the match is found at the start of the string, it returns a match object. Otherwise, it returns None. It follows the syntax: re.match(pattern, string, flags=0) pattern: The regex pattern to match. string: The input string to be searched. flags: (Optional) Allows modification of matching behavior.

Q: What is the difference between Python re.match() vs. re.search()?

Python re.match() searches for a pattern at the beginning of a string. Python re.search() is a function that searches for a pattern anywhere in the string.

Q: What is Python re.sub()?

Python re.sub() is a function that’s used for substituting occurrences of a pattern in a string. It takes three main arguments: The pattern you want to replace, a regular expression. The replacement string, i.e. what you want to replace it with. The original string in which you want to replace the occurrences of the pattern. It follows this syntax: re.sub(pattern, replacement, string, count=0, flags=0)

Python’s re.match() and re.sub() are two methods from its re module. re.sub() is a function that substitutes occurrences of a pattern in a string, while re.match() is a function that checks for a match at the beginning of a string. Here’s how to use each one with examples.

Python Regular Expressions re.match() vs. re.sub() Defined

re.match(): Python’s re.match() function is used to check for a match only at the beginning of the string. If a match is found, it returns the match object. Otherwise, it returns None.
re.sub(): Python’s re.sub() function substitutes occurrences of a pattern in a string.

Python re.match() Explained

The re.match() function checks for a match only at the beginning of the string. If the match is found at the start of the string, it returns a match object. Otherwise, it returns None.

Syntax

re.match(pattern, string, flags=0)

pattern: The regex pattern to match.
string: The input string to be searched.
flags: (Optional) Allows modification of matching behavior.

Python re.match() Example

Let’s check if a string starts with a word followed by numbers.

import re

text = "Price123 is the total cost."
match = re.match(r'\w+\d+', text)

if match:
    print(f"Matched: {match.group()}")
else:
    print("No match found")

Here, \w+ matches one or more word characters (letters, digits, and underscores), and \d+ matches one or more digits. Since the string starts with "Price123", it successfully matches and prints it.

More on PythonPython Set Difference: A Complete Guide

Python re.sub() Explained

The re.sub() function is used for substituting occurrences of a pattern in a string. It takes three main arguments:

The pattern you want to replace, a regular expression.
The replacement string, i.e. what you want to replace it with.
The original string in which you want to replace the occurrences of the pattern.

Syntax

re.sub(pattern, replacement, string, count=0, flags=0)

pattern: The regex pattern to search for.
replacement: The string to replace the matched pattern.
string: The input string where the replacement will occur.
count: (Optional) Limits the number of replacements. By default, all occurrences are replaced.
flags: (Optional) Allows modification of matching behavior, like case-insensitivity.

Python re.sub() Example

Let’s replace all the digits in a string with the word NUM.

import re

text = "The price is 123 dollars and 45 cents."
new_text = re.sub(r'\d+', 'NUM', text)

print(new_text)

Output:

The price is NUM dollars and NUM cents.

Here, \d+ is the regex pattern that matches one or more digits. The re.sub() function replaces all occurrences of this pattern with the string 'NUM'.

Differences Between Python re.sub() and re.match()

re.sub() is used for substitution and applies to the whole string.
re.match() checks if the string starts with a match, and it doesn’t search beyond the first match in the string.

Let’s dive deeper into re.sub() and re.match() with more advanced examples and explanations of regular expressions patterns.

Python re.match() Advanced Example

Let’s now look at how we can use re.match() with more complex patterns. Assume you want to validate whether a given string is a valid email address, but we only want to check if it starts with an email format.

Example

import re

email = "[email protected] sent you a message."

# Basic email pattern matching the start of a string
pattern = r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+'

match = re.match(pattern, email)

if match:
    print(f"Valid email found: {match.group()}")
else:
    print("No valid email at the start")

Explanation

^[a-zA-Z0-9_.+-]+: This part matches one or more alphanumeric characters, dots (.), underscores (_), plus signs (+), or hyphens (-). The ^ ensures the match starts at the beginning of the string.
@[a-zA-Z0-9-]+: This matches the @ symbol followed by one or more alphanumeric characters or hyphens (the domain name).
\.[a-zA-Z0-9-.]+: Matches a dot (.) followed by alphanumeric characters, hyphens, or additional dots (the top-level domain).

This pattern will match valid email addresses at the beginning of the string.

Output:

Valid email found: [email protected]

Python re.sub() Advanced Example

Suppose we want to format phone numbers by replacing their format. We have phone numbers like 123-456-7890 and we want to replace them with a format that looks like (123) 456-7890.

Example

import re

text = "Contact me at 123-456-7890 or 987-654-3210."
formatted_text = re.sub(r'(\d{3})-(\d{3})-(\d{4})', r'(\1) \2-\3', text)

print(formatted_text)

Explanation

\d{3}: This matches exactly three digits.
(\d{3}): Parentheses () are used for capturing groups. In this case, we’re capturing the first three digits as one group.
r'(\1) \2-\3': This is the replacement string. It uses \1, \2, and \3 to refer to the captured groups (the area code, first three digits, and last four digits, respectively).
So, this example finds phone numbers in the 123-456-7890 format and converts them to (123) 456-7890.

Output:

Contact me at (123) 456-7890 or (987) 654-3210.

Common Python Regular Expression Patterns

\d: Matches any digit (equivalent to [0-9]).
\w: Matches any word character (alphanumeric plus underscore). Equivalent to [a-zA-Z0-9_].
+: Matches one or more occurrences of the preceding character or group.
*: Matches zero or more occurrences of the preceding character or group.
.: Matches any character except newline.
^: Anchors the pattern to the start of the string.
$: Anchors the pattern to the end of the string.
{m,n}: Matches between m and n occurrences of the preceding character or group.
[ ]: Used to define a character set. For example, [a-z] matches any lowercase letter.
(): Used for capturing groups, allowing us to extract parts of the match and reference them later (like in re.sub()).

A tutorial on regular expressions in Python. | Video: Kite

Combining re.sub() With Functions

You can also use a function as the replacement in re.sub() if you want more dynamic behavior. Let’s see how.

Example: Capitalize every word in a sentence.

import re

text = "this is a test sentence."

def capitalize(match):
    return match.group(0).capitalize()

new_text = re.sub(r'\b\w+\b', capitalize, text)

print(new_text)

Explanation:

\b: Word boundary.
\w+: Matches one or more word characters.
The capitalize() function is called for each match, and it capitalizes the first letter of each word.

Output:

This Is A Test Sentence.

re.match() vs re.search():

If you want to search for a pattern anywhere in the string (not just at the beginning), you should use re.search() instead of re.match().

Example Using re.search()

import re

text = "This is my email [email protected]"

# Search for an email pattern anywhere in the string
pattern = r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+'

search = re.search(pattern, text)

if search:
    print(f"Email found: {search.group()}")
else:
    print("No email found")

Output:

Email found: [email protected]

Here, re.search() looks for the pattern anywhere in the string, unlike re.match(), which only checks the start.

More on Python3 Ways to Add Rows to a Pandas DataFrame

Understanding Python re.sub() and re.match()

re.sub(): Replaces matches of a pattern within a string. Can use captured groups for dynamic replacements or even a function.
re.match(): Checks for a match at the beginning of a string. Useful for validation or checking the start of a string.
re.search(): Searches for a pattern anywhere in the string, not limited to the start.

These examples should give you a more comprehensive understanding of how regex works in Python!

Frequently Asked Questions

What is re.match() in Python?

re.match() is a function in Python that searches for a match only at the beginning of the string. If the match is found at the start of the string, it returns a match object. Otherwise, it returns None. It follows the syntax:

re.match(pattern, string, flags=0)

pattern: The regex pattern to match.
string: The input string to be searched.
flags: (Optional) Allows modification of matching behavior.

What is the difference between Python re.match() vs. re.search()?

Python re.match() searches for a pattern at the beginning of a string. Python re.search() is a function that searches for a pattern anywhere in the string.

What is Python re.sub()?

Python re.sub() is a function that’s used for substituting occurrences of a pattern in a string. It takes three main arguments:

The pattern you want to replace, a regular expression.
The replacement string, i.e. what you want to replace it with.
The original string in which you want to replace the occurrences of the pattern.

It follows this syntax:

re.sub(pattern, replacement, string, count=0, flags=0)

Python Regular Expressions re.match() and re.sub() Explained

Python Regular Expressions re.match() vs. re.sub() Defined

Python re.match() Explained

Syntax

Python re.match() Example

Python re.sub() Explained

Syntax

Python re.sub() Example

Differences Between Python re.sub() and re.match()

Python re.match() Advanced Example

Example

Explanation

Python re.sub() Advanced Example

Example

Explanation

Common Python Regular Expression Patterns

Combining re.sub() With Functions

Example: Capitalize every word in a sentence.

re.match() vs re.search():

Example Using re.search()

Understanding Python re.sub() and re.match()

Frequently Asked Questions

What is re.match() in Python?

What is the difference between Python re.match() vs. re.search()?

What is Python re.sub()?

Recent Python Articles