5 Number Summary in Statistics
Introduction
The main
objective of descriptive statistics is to understand the nature of the
dataset. Five number summary is a part of descriptive statistics and
consists of five values and all this five values will help us to describe the
data.
The five
number summary statistics are:
- The
minimum value (the lowest value)
- 20th
percentile or Q1
- 50th
percentile or Q2 or median
- 75th
percentile or Q3
- Maximum
value (the highest value)
Understanding
the concept
Let us
understand the 5 number summary statistic using an example below.
If we
have a distribution A data points,
A = {11,
23, 32, 26, 16, 19, 30, 14, 16, 10}
First we
will arrange data points in ascending order and then calculate the summary.
A = {10, 11, 14, 16, 16, 19, 23, 26, 30, 32}
Minimum Value
In this
we have to find the minimum value in the data set. The data point with the
lowest value will be consider as Minimum Value.
Considering
the above distribution A the minimum value is 10.
25th percentile (Q1)
The 25th percentile is also known as first or lower quartile. The 25th percentile is the value where 25% of data lies below that value.
50th Percentile (Median Q2)
The 50th percentile is also known as median and is denoted by Q2. The median cuts the data set exactly into two halves. 50% of data lies above the meidan and 50% of data lies below the median.
75th percentile (Q3)
75th percentile is also called as thirst or upper quartile. The 75th percentile is the value where 25% data lies above that value.
Maximum Value
In this
we have to find the maximum value in the data set. The data point with the
lowest value will be consider as Maximum Value.
Considering
the above distribution A the maximum value is 32.
IQR
IQR is
known as Inter Quartile Range. The IQR is one of the method to find outliers in
the data. When we use IQR the whole dataset is divided into two parts.
IQR = (Q3
â€“ Q1)
= 26 â€“ 14
= 12
Also we
should know
[ Lower
Bracket . . . . . . . . . .. . . . . . Higher Bracket]
Lower
Bracket = Q1 â€“ 1.5(IQR)
= 14 â€“
1.5(12)
= 14 â€“ 18
= -4
Higher
Bracket = Q3 + 1.5(IQR)
= 26 +
1.5(12)
= 26 + 18
= 44
The data
beyond and below this can be treated as outliers.
Visualization
A box plot is one of the most important visualization in statistics. It is a standardized way of representing a particular distribution on basis of 5 Number summary. A box-plot is also known as Whisker plot. It is one of the most efficient way to detect the outlier in the datasets.
The
visual shows us the box-plot on the two end we have minimum and maximum. On the
box we have Q1, Q2 and Q3.
Outliers
In
statistics, an outlier is a data point that differs significantly from other
observations. An outlier can occur due to experimental errors. An outlier can
be a serious issue in a data set. We will discuss about the outliers in detail
in upcoming articles. In box plot the data points beyond particular minimum and
maximum value can be considered as outliers.
We will
discuss in detail about outliers and techniques to handle it in our next blog.
Thank you!
- Jay Charole
- Mar, 11 2022