top of page

Extracting Calling Codes Using Python's Regular Expressions Library

Introduction


We've all gotten calls like this before...

But how exactly does our phone know where these calls are coming from?


Turns out, different countries have different calling codes--the US and Canada use +1, for example, and Panama, +507.


Today, we'll look at how to use the regular expressions library to identify from which country a "call" is coming from.


Importing Regular Expressions


As always, we start by importing our library.

import re

def main():
	...

main()

Now, each phone number is generally formatted like such:


+(code) xxx-xxx-xxxx


Here, the code can range from any 1 digit number to any 3 digit number, whereas the next groups are 3 digits, 3 digits, and 4 digits, respectively.


Using the regular expressions library, we can give Python a blueprint or pattern so that when we enter a phone number as input, it knows exactly what to expect.

import re

def main():
	blueprint = r"\+\d{1,3} \d{3}-\d{3}-\d{4}"

main()

Things to note:

  • r"" is a string format that reads strings raw (i.e. \n is taken literally, not as an exit)

  • "\+" means the plus sign is to be taken literally, as + means something different in this library

  • \d{number} tells Python how many digits to expect, and it can also be a range of numbers (1,3)


Okay, now let's ask the user for input and compare it to our blueprint:

import re

def main():
	blueprint = r"\+\d{1,3} \d{3}-\d{3}-\d{4}"
	num = input("Enter a number: ")
	
	match = re.search(blueprint, num)
	if match:
		print("Found")
	else:
    		print("No match found.")

main()

RegEx's search( ) function takes two parameters: a blueprint, and a string. It then checks if that string matches the blueprint's criteria.


Capture Groups


So, if I were to enter +1 123-456-789, this code would (ideally) print out "Found."


But we can take this one step further, printing out the code itself.


Unfortunately, since we don't know exactly how many digits this code will take up, we can't get away with splicing the string.


Luckily, we can specify a capture group, specific parts of the string we want to capture, using parenthesis like so:

import re

def main():
	blueprint = r"(\+\d{1,3}) \d{3}-\d{3}-\d{4}"
	...

...

Since we want to "capture" this calling code, our parenthesis go around that part.


In order to print this out, we'll need to call the group( ) function on match:

import re

def main():
	blueprint = r"(\+\d{1,3}) \d{3}-\d{3}-\d{4}"
	num = input("Enter a number: ")
	
	match = re.search(blueprint, num)
	if match:
		print("Found: " + match.group(1))
	else:
    		print("No match found.")

main()

Note how match takes one argument, which specifies exactly which capture group we want to store or print.


Calling group( ) with no arguments would default to zero, and thus, print out the whole phone number, which we don't want. Therefore, we specified "1" for the first group.


Final Thoughts


You can imagine how powerful these capture groups can be, especially when paired with some sort of dictionary, matching each calling code to a country.


Thanks for reading!

Comments


bottom of page