5 Number Summary in Statistics
Data Science

5 Number Summary in Statistics

Introduction

The main objective of descriptive statistics is to understand the nature of the dataset.  Five number summary is a part of descriptive statistics and consists of five values and all this five values will help us to describe the data.

The five number summary statistics are:

  1. The minimum value (the lowest value)
  2. 20th percentile or Q1
  3. 50th percentile or Q2 or median
  4. 75th percentile or Q3
  5. Maximum value (the highest value)

Understanding the concept

Let us understand the 5 number summary statistic using an example below.

If we have a distribution A data points,

A = {11, 23, 32, 26, 16, 19, 30, 14, 16, 10}

First we will arrange data points in ascending order and then calculate the summary.

A = {10, 11, 14, 16, 16, 19, 23, 26, 30, 32}

Minimum Value

In this we have to find the minimum value in the data set. The data point with the lowest value will be consider as Minimum Value.

Considering the above distribution A the minimum value is 10.

25th percentile (Q1)

The 25th percentile is also known as first or lower quartile. The 25th percentile is the value where 25% of data lies below that value.

50th Percentile (Median Q2)

The 50th percentile is also known as median and is denoted by Q2. The median cuts the data set exactly into two halves. 50% of data lies above the meidan and 50% of data lies below the median.


75th percentile (Q3)

75th percentile is also called as thirst or upper quartile. The 75th percentile is the value where 25% data lies above that value.

Maximum Value

In this we have to find the maximum value in the data set. The data point with the lowest value will be consider as Maximum Value.

Considering the above distribution A the maximum value is 32.

IQR

IQR is known as Inter Quartile Range. The IQR is one of the method to find outliers in the data. When we use IQR the whole dataset is divided into two parts.

IQR = (Q3 – Q1)

= 26 – 14

= 12

Also we should know

[ Lower Bracket . . . . . . . . . .. . . . . .  Higher Bracket]

Lower Bracket = Q1 – 1.5(IQR)

= 14 – 1.5(12)

= 14 – 18

= -4

Higher Bracket = Q3 + 1.5(IQR)

= 26 + 1.5(12)

= 26 + 18

= 44

The data beyond and below this can be treated as outliers.

Visualization

A box plot is one of the most important visualization in statistics. It is a standardized way of representing a particular distribution on basis of 5 Number summary. A box-plot is also known as Whisker plot. It is one of the most efficient way to detect the outlier in the datasets.


The visual shows us the box-plot on the two end we have minimum and maximum. On the box we have Q1, Q2 and Q3.

Outliers

In statistics, an outlier is a data point that differs significantly from other observations. An outlier can occur due to experimental errors. An outlier can be a serious issue in a data set. We will discuss about the outliers in detail in upcoming articles. In box plot the data points beyond particular minimum and maximum value can be considered as outliers.

We will discuss in detail about outliers and techniques to handle it in our next blog. Thank you!

  • Jay Charole
  • Mar, 11 2022

Add New Comments

Please login in order to make a comment.

Recent Comments

Be the first to start engaging with the bis blog.