Menu Close

Matching an expression only in specific places using regular expressions

Basics of Regular Expressions in Python, check the link below πŸ‘‡ πŸ‘‡
Regular Expressions Regex basic in Python πŸ‘ˆ πŸ‘ˆ

Also with table of contents about symbols and their usage in Regex

Often you want to match an expression only in specific places (leaving them untouched in others, that is). Consider the following sentence:
An apple a day keeps the doctor away (I eat an apple everyday).

Here the “apple” occurs twice which can be solved with so called backtracking control verbs which are supported by the newer regex module. The idea is:

forget_this | or this | and this as well | (but keep this)

With our apple example, this would be:

import regex as re    # Download regex using pip3
string = "An apple a day keeps the doctor away (I eat an apple everyday)."
rx = re.compile(r''' \([^()]*\) (*SKIP)(*FAIL) | apple''', re.VERBOSE)  
# match anything in parentheses and "throw it away"
# or
# match an apple

apples = rx.findall(string)
print(apples)
#Output: ['apple']  

This matches “apple” only when it can be found outside of the parentheses. Here’s how it works:

  • While looking from left to right, the regex engine consumes everything to the left, the (SKIP) acts as an “always-true-assertion”. Afterwards, it correctly fails on (FAIL) and backtracks.
  • Now it gets to the point of (SKIP) from right to left (aka while backtracking) where it is forbidden to go any further to the left. Instead, the engine is told to throw away anything to the left and jump to the point where the (SKIP) was invoked.

ILLUSTRATION USING PYTHON3 CLI

Iterating over matches using re.finditer

You can use re.finditer to iterate over all matches in a string. This gives you (in comparison to re.findall extra information, such as information about the match location in the string (indexes):

import re
text = 'You can try to find an ant in this string'
pattern = 'an?\w' # find 'an' either with or without a following word character
for match in re.finditer(pattern, text):
    # Start index of match (integer)
    sStart = match.start()
    # Final index of match (integer)
    sEnd = match.end()
    # Complete match (string)
    sGroup = match.group()
    # Print match
    print('Match "{}" found at: [{},{}]'.format(sGroup, sStart,sEnd))

OUTPUT:
Match “an” found at:  [5,7]
Match “an” found at:  [20,22]
Match “ant” found at:  [23,26]



Morae Q!

  1. OS operating system module using path parameter.
  2. Find the smallest and largest integers after sorting using bubble sort.
  3. Find the integer and its number of occurrences.
  4. Algorithm complexity – Big O Notation with Examples.
  5. Linear search.
  6. Map module using series mapping and parallel mapping.
  7. Mapping and Transposing with Map Module.
  8. Map Module/built-in Function.
  9. Linux commands for managing files.
  10. Program which takes two lists and maps two lists into a dictionary.
  11. Splitting strings and substituting using regular expressions regex.
  12. Basics of regular expression Regex.
  13. Find the square value of the dictionary.
  14. Check whether the given key is present in the dictionary.
  15. Matching an expression only in specific places using regular expressions.
  16. Escaping special characters using regular expressions.
  17. Check the key.
  18. Grouping and sub-grouping using regular expressions.
  19. Generate a Dictionary that Contains Numbers in the Form (x , x*x).
  20. Algorithm complexity Big-Theta, Big-Omega and Big-O Notations.