Remove punctuation from text using Python

Text cleaning as part of preprocessing for Text Analytics

Image for post
Image for post
Common punctuation marks seen in text. (Source: BookBaby Blog)

Removal of punctuation is a necessary step in cleaning the text data before performing text analytics. Python offers numerous ways to deal with punctuation. Below given is a simple implementation using ‘re’ and ‘string’ modules.

import re
import string

The punctuation attribute of ‘string’ module is used as the reference list to look for all possible punctuation in the text data. Then, substitute function from ‘re’ is used to replace all punctuation from the target string or text data.

s = "A@p,p!!le#"
punctuation = '['+string.punctuation+']'
re.sub(punctuation,'',s)

The output of the last line above is:

'Apple'

Data Science Professional | 6+ years of experience in analytics across various domains — retail, insurance, finance and digital advertising

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store