In this article we are going to make similar plots using Python’s Seaborn library and R’s ggplot2. The Python Seaborn library is built over Matplotlib library but it has much simpler syntax structure than matplotlib.
Visualizing data in Python
Seaborn is one of the richest data science library which provides a
To start let’s first import our libraries.
import seaborn as sns import matplotlib.pyplot as plt
Now that we have imported our libraries let’s go through some functions that will help you to give graphs a personal touch. 🙂
Description of various functions which we will be using in this tutorial:
sns.set_style()sets the background theme of the plot. “ticks” is the closest to the plot made in R.
sns.set_context()will apply predefined formatting to the plot to fit the reason or context the visualization is to be used.
font_scale=1is used to set the scale of the font size for all the text in the graph.
plt.figure()is a command to control different aspects of the matplotlib graph (as stated before seaborn graphs are just Matplotlib plots under the hood).
sizes=(800,1000)controls the minimum and maximum size of the scatter points on the plot.
plt.title()gives the plot its main title. If you are an experienced Matplotlib user or used
plt.subtitle() before you know the confusion when using the two together. The arguments are self-explanatory.
plt.xlabel()will format the x-axis label. I use
set_..to access the class to include aesthetic properties. This can get cluttered at times but there are many ways to format a seaborn/matplotlib plot. This is useful for after the plot has been created. The plot was already made
sonow we need to override the default formats in this manner.
plt.ylabel()works in the exact same way just for the y-axis.
sns.pairplot()plot pairwise relationships in a dataset. By default, this function will create a grid of Axes such that each variable
willbe shared in the y-axis across a single row and in the x-axis across a single column. The diagonal Axes are treated differently, drawing a plot to show the univariate distribution of the data for the variable in that column.
data: DataFrame – Tidy (long-form) data frame where each column is a variable and each row is an observation.
hue: String (variable name), optional. Variable
tomap plot aspects to different colours.
Now been done with formalities let’s jump to the coding part.
We will be using Iris Data set for this tutorial. You can download Iris data set from here.
Importing required libraries and dataset
import seaborn as sns import matplotlib.pyplot as plt import pandas as pd data = pd.read_csv('iris.data', header=None, names=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width', 'Species'])
- I set
header=None as the file contains no header row.
Next, I set the names of the columns by passing;
namesas the list of column names.
The data will be loaded as follows:
Plotting the Pairplot
Add following lines of code to the previous code.
... sns.set_style("ticks") sns.set_context("talk") plt.figure() p = sns.pairplot(data=data, hue="Species") plt.show()
Seaborn will output a beautiful Plot of various features.
Plotting the Correlation matrix in Python
Next we will draw a correlation matrix, to identify the correlation between various features of the dataset.
... plt.figure() sns.heatmap(data.iloc[:,:-1].corr()) plt.show()
returnsthe correlation matrix and
- Try out for yourself.
New to Python? Go through our Quick Introduction to Python and boost your py basics.
Visualizing data in R
We will be
The following R code will load
pairplot for us.
Plotting Pairplot and Correlation Matrix
library(ggplot2) # Data visualization # Load the dataset iris=read.csv('iris.data') # First let's get a random sampling of the data iris[sample(nrow(iris),10),] # plotting pairplot library(GGally) ggpairs(iris, aes(colour = Species))
We got a highly detailed
pairplot and that too in bare minimum lines of code.
Such is the beauty of R that we got the pair-plots and correlation matrix both on the same plot.
One of the main differences I believe is that
ggplot2 uses a layered approach wherein the user can add aesthetics and formats in any order to create the figure (which I believe can be more simpler despite the amount of code required). Most people do not notice and this may be more significant to some more than others,
Recreating the same plot — albeit with minor differences — is very possible with
ggplot2. While the tools are different, they can still be used to create the same object.