Logic of hypothesis testing and types of errors

Image for post
Image for post
Photo by Kevin Ku on Unsplash

You might have come across innumerable claims and statements involving numbers, especially in marketing campaigns and ads. “9 out of 10 doctors recommend Colgate toothpaste”, or “Dettol kills 99.9% of bacteria” are classic examples of numerical claims. The statistical validity of such statements regarding a parameter can be tested if you collect some sample data and perform certain calculations on it. This is the foundation of inferential statistics using hypothesis testing.


Generate word cloud from top results for a Google search query

Image for post
Image for post
Photo by Thimo Pedersen on Unsplash

When you search something on Google, millions of results get thrown at you, of which you are likely to go through top few relevant ones. What if you get a snapshot of what has been written in the top results for a search query in the form of a word-cloud? Well, it will be interesting to see the content most spoken about the item you are searching for without perusing each result. Let’s see how to implement this in Python.

Creating Word-cloud from top results of a Google Search Query

Here’s a simple flow-chart of the algorithm to create a word-cloud from the top 10 results of a Google search query. …


Basics to create useful visuals in python using 'matplotlib' and 'seaborn'

Image for post
Image for post
Data Visualization in Python (Source: Simplified Python)

Visualizing data is the key to exploratory analysis. It is not just for aesthetic purposes, but is essential to uncover insights on data distributions and feature interactions.

In this article, you will be introduced to the basics of creating some useful and common data visualizations using the ‘matplotlib’ and ‘seaborn’ modules in python. The built-in dataset ‘iris’ from sklearn module is used for the demonstration. …


Python modules containing built-in datasets and ways to access them

Image for post
Image for post
IRIS types (Source: DataCamp)

Built-in datasets prove to be very useful when it comes to practicing ML algorithms and you are in need of some random, yet sensible data to apply the techniques and get your hands dirty. Many modules in python house some common datasets of the likes of the popular ‘Iris’ data. In this article, we will see the datasets available within ‘sklearn’ and ‘statsmodels’ modules, and ways to access the data and related info. …


Text cleaning as part of preprocessing for Text Analytics

Image for post
Image for post
Common punctuation marks seen in text. (Source: BookBaby Blog)

Removal of punctuation is a necessary step in cleaning the text data before performing text analytics. Python offers numerous ways to deal with punctuation. Below given is a simple implementation using ‘re’ and ‘string’ modules.

import re
import string

The punctuation attribute of ‘string’ module is used as the reference list to look for all possible punctuation in the text data. Then, substitute function from ‘re’ is used to replace all punctuation from the target string or text data.

s = "A@p,p!!le#"
punctuation = '['+string.punctuation+']'
re.sub(punctuation,'',s)

The output of the last line above is:

'Apple'

Thank you! Stay tuned for more interesting things you can do with Python!

About

Anjana K V

Data Science Professional | 6+ years of experience in analytics across various domains — retail, insurance, finance and digital advertising

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store