Introduction
In this article, you will learn about Python Regex, the meaning of various Python Regex terms, and how to check whether a given email address is valid or invalid using Python.
Python
Python is an interpreted, high-level, general-purpose programming language created by Guido van Rossum and first released in 1991. He started Python as a hobby project to keep him occupied in the week around Christmas. It got its name from the name of the British comedy troupe Monty Python. It is used in:
- Software Development
- Web Development
- System Scripting
- Mathematics
To learn how to program in Python, visit Python Basics.
Python Regex
Regex or Regular Expressions are present in every language, be it Java or JavaScript, or any other language. A series of characters defining a search pattern is called a regular expression. These patterns are typically used to "find" or "find and replace" operations on strings or for input validation by string-searching algorithms. It is a methodology developed in formal language theory and theoretical computer science.
You can refer to the following table to create or decode various regex expressions:
Regular Expressions |
Description |
foo.* |
# Matches any string starting with foo |
\d* |
# Match any number decimal digits |
[a-zA-Z]+ |
# Match a sequence of one or more letters |
text |
Match literal text |
. |
Match any character except newline |
^ |
Match the start of a string |
$ |
Match the end of a string |
* |
Match 0 or more repetitions |
+ |
Match 1 or more repetitions |
? |
Match 0 or 1 repetition |
+? |
Match 1 or more, as few as possible |
*? |
Match 0 or more, as few as possible |
{m,n} |
Match m to n repetitions |
{m,n}? |
Match m to n repetitions, few as possible |
[...] |
Match a set of characters |
[^...] |
Match characters, not in a set |
A | B |
Match A or B (...) Match regex in parenthesis as a group |
\number |
Matches text matched by the previous group |
\A |
Matches start of the string |
\b |
Matches empty string at beginning or end of the word |
\B |
Matches empty string not at begin or end of the word |
\d |
Matches any decimal digit |
\D |
Matches any non-digit |
\s |
Matches any whitespace |
\S |
Matches any non-whitespace |
\w |
Matches any alphanumeric character |
\W |
Matches characters not in |
\w \Z |
Match at end of the string. |
\\ |
Literal backslash |
To learn more about Python Regular Expressions, take a look at the following links:
- To know the basics of Python Regex visit and visit.
- For a list of Python Regex, visit.
Validate an Email Address in Python
Validating an email means whether the input that the user made corresponding to the email address field is as per the format in which we want. Suppose we as programmers set the email format to be "first_name.last_name@company_name.com" and the user enters "[email protected]". This input violates our condition. Some readers may think about how we decide which is the "first_name" and which is the "last_name". It is decided based on the first name and last name entered by the user. In this condition, I assume that the user enters "rohit" as first_name and "gupta" as last_name.
The most common implementation of validation of an email address is found in the mail servers where when you enter your email address it is checked whether or not it follows a pre-defined format of that particular mail server.
Now let us see how we achieve this using Python:
Method 1: Using "re" package
import re
regex = '^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$'
def check(email):
if(re.search(regex,email)):
print("Valid Email")
else:
print("Invalid Email")
if __name__ == '__main__' :
email = "[email protected]"
check(email)
email = "[email protected]"
check(email)
email = "[email protected]"
check(email)
To understand the code, you need to understand what is the meaning of "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$".
- ^ means to match the starting of the string, i.e. here it tells the interpreter that the sequence that follows ^ is the format based on which it has to decide which email is valid and which is not.
- [...] means to match a set of characters, and [a-z0-9] means to find a sequence/combination of characters that contains characters from small 'a' to small 'z' and numbers from 0 to 9. + means to match 1 or more repetitions. Hence, [a-z0-9]+ means to match all the combinations of small 'a' to small 'z' and numbers from 0 to 9, repeating one or more than one time.
- [\._] means to match '.' (dot) and ? means to find 0 or 1 repetitions. Since we don't allow more than one consecutive dot in an email address so [\._]? means it has to match zero or one occurrence of a dot.
- We have another [a-z0-9]+ so as to find another combination of small 'a' to small 'z' and numbers from 0 to 9 repeating one or more than one time, which may or may not be a successor of a dot.
- [@] means to match @, and \w means to match any alphanumeric character, i.e. [@]\w+ means to match @ followed by any alphanumeric character, repeating one or more than one time
- [.]\w{2,3} means to match dot followed by any alphanumeric combination of characters of length 2 or 3. This is used to match the domain names which are of length 2 and 3. If you want to check custom domain names so you can replace this with \w+.
- $ means to match the end of the string, i.e. $ means to mark the end of validation sequence.
The output of the program will be that "[email protected]" is considered invalid and, "[email protected]" and "[email protected]" are considered valid.
"[email protected]" is considered invalid because of "-" that is after "@".
If you want to get these kinds of email to be considered valid you need to change the regex to "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[-]?\w+[.]\w{2,3}$". You can see i have added +[-]?\w.
You can form your own sequences like "^[a-z]([w-]*[a-z]|[w-.]*[a-z]{2,}|[a-z])*@[a-z]([w-]*[a-z]|[w-.]*[a-z]{2,}|[a-z]){4,}?.[a-z]{2,}$". I will not explain this regex, its your challenge to decode and understand what does this regex means.
Other Methods
There are various Python packages and APIs available that are coded in a manner that you don't have to code so much and in just 2 lines of code, you will be able to validate the given email address.
Below are some of the Email Validation Python packages:
- email-validator 1.0.5
- pylsEmail 1.3.2
- py3-validate-email
Given below are some of the Email Validation APIs:
- Mailboxlayer
- Isitrealemail
- Sendgrid’s Email Validation API
There are a lot of other Python packages and APIs which are both free as well as paid.
You can write your own Python package or API if you don't want to use the pre-existing one, or you can help to make the current Python packages and APIs better by contributing to their version control repositories like GitHub repos.
Conclusion
In this article, we discussed what is Python regex and how we can create different regex expressions. Then we learned different ways of validating an email address in Python. Please try these methods out and comment with your views on how useful this article was.
Visit C# Corner to find answers to more such questions.
If you have any questions regarding any other technology do have a look at the C# Corner Technology Page.