Data Visualization with Seaborn
Seaborn
What is seaborn?
Seaborn is a python library use for constructing data visualization with statistics. Seaborn supports you in exploring and interpreting your data. Seaborn was created by Michael Waskom and latest version of seaborn is v0.11.2 and it was launched at August 2021.
Installation of seaborn
So first of all, let’s see how to install seaborn library using pip command. In Python, this command is used to install and remove various modules. In Python, it can be used to develop and maintain virtual environments.
We can easily install this package from the command prompt using the pip command. By executing the command below, we can install the seaborn package.
# Install seaborn Library
pip install seaborn
Note: If you have Python3, then you should write pip3 in the code.
Import Seaborn library
Now let’s use the seaborn library. So for that you have to call it using import and then alias it with a shorter name. Code for that is:
# Import seaborn
import seaborn as sns
Load the Dataset
Seaborn comes with 18 built-in datasets, which can be loaded with the following command.
# Load all the Dataset of seaborn
sns.get_dataset_name()
This tutorial will take use of the Titanic dataset.
# Load Dataset
df = sns.load_dataset('titanic')
df.head()
Output:
Different types of graphs
Count plot
When working with categorical data, a count plot is useful. It's used to graph the frequency of various categories. In the titanic data, the column sex comprises categorical data, such as male and female.
sns.countplot(x='sex',data=df)
Output:
KDE plot
The distribution of continuous data is plotted using a Kernel Density Estimate (KDE) Plot.
sns.kdeplot(x = 'age' , data = df , color = 'black')
Output:
Distribution plot
A KDE plot is identical to a Distribution plot. It's a way of visualizing the distribution of continuous data.
sns.displot(x = 'age',kde=True,bins = 5 , data =df)
Output:
Scatter plot
We'll be using the iris dataset for this and the following plots. The iris dataset includes information on flower petal size (length and width) as well as sepal size (sepal length and sepal width).
These characteristics are used to classify iris types (Setosa, Versicolour, and Virginica). We'll try to figure out how the features are related in the sections below.
We'll start by loading the iris dataset.
#Load Iris Dataset
df = sns.load_dataset('iris')
df.head()
Scatter plots improve in the understanding of data relationships.
sns.scatterplot(x='sepal_length', y ='petal_length' ,
data = df , hue = 'species')
Output:
Heatmaps
Confusion, matrices, and correlation can all be visualised with a heat map.
corr = df.corr()
sns.heatmap(corr)
Output:
Conclusion
Data visualization is a useful tool to have in your toolkit, and Seaborn is one of them. Because it is based on matplotlib, you can change your plots in the same manner that you can customize matplotlib plots.
Happy Learning.
- Avani Popat
- Mar, 10 2022