Menu Close

Basics of regular expression Regex

A regular expression is a string used for pattern matching. Regular expressions can be used to search for strings that match a certain pattern, and sometimes to manipulate those strings.
Many UNIX System commands (including grep, vi, emacs, sed, and awk) use regular expressions for searching and for text manipulation.
The re module in Python gives you many powerful ways to use regular expressions in your scripts.
Only some of the features of re will be covered here.

Pattern Matching

In Python, a regular expression object is created with re.compile(). Regular expression objects have
many methods for working with strings, including search(), match(), findall(), split(), and sub(),
Here’s an example of using a pattern to match a string:

import re
maillist = ["fcukthe@code.com", "black@clovercode.com", "Demon@Slayer.com"]
emailre = re.compile(r"code")
for email in maillist :
    if emailre.search(email) :
        print(email, "is a match.")

# OUTPUT:
#       fcukthe@code.com is a match.
#       black@clovercode.com is a match.

This example will print the addresses fcukthe@code.com and black@clovercode.com, but not Demon@Slayer.com.
It uses re.compile(r”code”) to create an object that can search for the string land. (The r is used in front of a regular expression string to prevent Python from interpreting any escape sequences it might contain.)
This script then uses emailre.search(email) to search each e-mail address for land, and prints the ones that match.

You can also use the regular expression methods without first creating a regular expression object. For example, the command re.search(r“land”, email) could be used in the if statement in the preceding example, in place of emailre.search(email) .

In short scripts it may be convenient to eliminate the extra step of calling re.compile() , but using a regular expression object ( emailre, in this example) is generally more efficient.

The method match() is just like search(), except that it only looks for the pattern at the beginning of
the string. For example,

import re
regex = re.compile(r'bo', re.I)
for element in ["Anime", "boku no hero academia", "Boku no pico"] :
    if regex.match(element) :
        print(regex.match(element).group())

will find strings that start with “Bo”. The re.I option in re.compile(r‘Bo’, re.I) causes the match to
ignore case(lower/upper case), so this example will also find strings starting with “KN”. The method group() returns the part of the string that matched. The output from this example would look like
bo
Bo

Constructing Patterns

As you have seen, a string by itself is a regular expression. It matches any string that contains it. For
example, venture matches “Adventures”. However, you can create far more interesting regular
expressions.
Certain characters have special meanings in regular expressions. This Table lists these characters,
with examples of how they might be used.

CharDefinitionExampleMatches
.Matches any single character.th.nkthink, thank, thunk,etc.
\Quotes the following character.script\.pyscript.py
*Previous item may occur zero or more times in arow..*any string, including the empty string
+Previous item occurs at least once, and maybe more.\*+*, *****, etc.
?Previous item may or may not occur.web\.html?web.htm, web.html
{n,m}Previous item must occur at least n times but no more than m times.\*{3,5}***, ****,*****
( )Group a portion of the pattern.script(\.pl)?script, script.pl
|Matches either the value before or after the |.(R|r)afRaf, raf
[ ]Matches any one of the characters inside. Frequently used with ranges.[QqXx]*Q, q, X, or x
[^]Matches any character not inside the brackets.[^AZaz]any nonalphabetic character, such as 2
\nMatches whatever was in the nth set of parenthesis.(croquet)\1croquetcroquet
\sMatches any white space character.\sspace, tab, newline
\sMatches any non-white space.the \Sthen, they, etc. (but not the)
\dMatches any digit.\d*0110, 27, 9876, etc.
\DMatches anything that’s not a digit.\D+same as [^0–9]+
\wMatches any letter, digit, or underscore.\w+t, AL1c3, Q_of_H, etc.
\WMatches anything that \w doesn’t match.\W+&#*$%, etc.
\bMatches the beginning or end of a word.\bcat\bcat, but not catenary or concatenate
^Anchor the pattern to the beginning of a string.^ Ifany string beginning with If
$Anchor the pattern to the end of the string.\.$any string ending in a period

Remember that it is usually a good idea to add the character r in front of a regular expression string.
Otherwise, Python may perform substitutions that change the expression.

Saving Matches

One use of regular expressions is to parse strings by saving the portions of the string that match your
pattern. For example, suppose you have an e-mail address, and you want to get just the username
part of the address:

import re
wemail = 'lb101@fcukthecode.com'
parsemail = re.compile(r"(.*)@(.*)")
(username, domain)=parsemail.search(wemail).groups()
print("Username:", username, "Domain:", domain)

#OUTPUT:  Username: lb101 Domain: fcukthecode.com

This example uses the regular expression pattern “(.)@(.)” to match the e-mail address. The pattern
contains two groups enclosed in parentheses. One group is the set of characters before the @, and
the other is the set of characters following the @. The method groups() returns the list of strings that
match each of these groups. In this example, those strings are lb101 and fcukthecode.com.

Finding a List of Matches

In some cases, you may want to find and save a list of all the matches for an expression. For example,

import re
regexp = re.compile(r"ap*le")
matchlist = regexp.findall(inputline)

searches for all the substrings of inputline that match the expression “ap*le”. This includes strings like
ale or apple. If you also want to match capitalized words like Apple, you could use the regular expression

import re
regexp = re.compile(r"ap*le", re.I)

instead.

One common use of findall() is to divide a line into sections. For example, the sample program in the
earlier section “Variable Scope” used

import re
splitline = re.findall (r"\w+", line.lower())

to get a list of all the words in line.lower().

Other useful things to do using regular expressions in python. links here down : )

Substitutions and Splitting using Regular Expressions Regex in Python 👇 👇
substitutions and splitting regex python 👈 👈

For the first part in using regular expressions and matching the string
you can visit this link –>   👇 👇
Matching the beginning of a string (Regex) Regular Expressions in python

The re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. If the search is successful, search() returns a match object or None otherwise.
For the second part in using regular expression and searching the string
visit –> Searching – Regular Expressions (Regex) in Python

Replacing Strings using Regex Python
Replacements can be made on strings using re.sub. 👇 👇 👇
Replacing Strings using Regular Expression Regex in Python..-FcukTheCode.com
Replacing strings and Using group references Replacements with a small number of groups can be made.


Morae Q!

  1. OS operating system module using path parameter.
  2. Find the smallest and largest integers after sorting using bubble sort.
  3. Find the integer and its number of occurrences.
  4. Algorithm complexity – Big O Notation with Examples.
  5. Linear search.
  6. Map module using series mapping and parallel mapping.
  7. Mapping and Transposing with Map Module.
  8. Map Module/built-in Function.
  9. Linux commands for managing files.
  10. Program which takes two lists and maps two lists into a dictionary.
  11. Splitting strings and substituting using regular expressions regex.
  12. Basics of regular expression Regex.
  13. Find the square value of the dictionary.
  14. Check whether the given key is present in the dictionary.
  15. Matching an expression only in specific places using regular expressions.
  16. Escaping special characters using regular expressions.
  17. Check the key.
  18. Grouping and sub-grouping using regular expressions.
  19. Generate a Dictionary that Contains Numbers in the Form (x , x*x).
  20. Algorithm complexity Big-Theta, Big-Omega and Big-O Notations.