Text cleaning as part of preprocessing for Text Analytics
Removal of punctuation is a necessary step in cleaning the text data before performing text analytics. Python offers numerous ways to deal with punctuation. Below given is a simple implementation using ‘re’ and ‘string’ modules.
The punctuation attribute of ‘string’ module is used as the reference list to look for all possible punctuation in the text data. Then, substitute function from ‘re’ is used to replace all punctuation from the target string or text data.
s = "A@p,p!!le#"
punctuation = '['+string.punctuation+']'
The output of the last line above is:
Thank you! Stay tuned for more interesting things you can do with Python!