Data Science

Data Visualization with Seaborn

Seaborn

What is seaborn?

Seaborn is a python library use for constructing data visualization with statistics. Seaborn supports you in exploring and interpreting your data. Seaborn was created by Michael Waskom and latest version of seaborn is v0.11.2 and it was launched at August 2021.

Installation of seaborn

So first of all, let’s see how to install seaborn library using pip command. In Python, this command is used to install and remove various modules. In Python, it can be used to develop and maintain virtual environments.

We can easily install this package from the command prompt using the pip command. By executing the command below, we can install the seaborn package.

# Install seaborn Library

pip install seaborn

Note: If you have Python3, then you should write pip3 in the code.

Import Seaborn library

Now let’s use the seaborn library. So for that you have to call it using import and then alias it with a shorter name. Code for that is:

# Import seaborn

import seaborn as sns

Load the Dataset

Seaborn comes with 18 built-in datasets, which can be loaded with the following command.

# Load all the Dataset of seaborn

sns.get_dataset_name()

This tutorial will take use of the Titanic dataset.

# Load Dataset

df = sns.load_dataset('titanic')

df.head()

Output:

Different types of graphs

Count plot

When working with categorical data, a count plot is useful. It's used to graph the frequency of various categories. In the titanic data, the column sex comprises categorical data, such as male and female.

sns.countplot(x='sex',data=df)

Output:

KDE plot

The distribution of continuous data is plotted using a Kernel Density Estimate (KDE) Plot.

sns.kdeplot(x = 'age' , data = df , color = 'black')

Output:

Distribution plot

A KDE plot is identical to a Distribution plot. It's a way of visualizing the distribution of continuous data.

sns.displot(x = 'age',kde=True,bins = 5 , data =df)

Output:

Scatter plot

We'll be using the iris dataset for this and the following plots. The iris dataset includes information on flower petal size (length and width) as well as sepal size (sepal length and sepal width).

These characteristics are used to classify iris types (Setosa, Versicolour, and Virginica). We'll try to figure out how the features are related in the sections below.

We'll start by loading the iris dataset.

#Load Iris Dataset

df = sns.load_dataset('iris')

df.head()

Scatter plots improve in the understanding of data relationships.

sns.scatterplot(x='sepal_length', y ='petal_length' ,

data = df , hue = 'species')

Output:

Heatmaps

Confusion, matrices, and correlation can all be visualised with a heat map.

corr = df.corr()

sns.heatmap(corr)

Output:

Conclusion

Data visualization is a useful tool to have in your toolkit, and Seaborn is one of them. Because it is based on matplotlib, you can change your plots in the same manner that you can customize matplotlib plots.

Happy Learning.

Avani Popat
Mar, 10 2022