Importance of Data Visualization
Before we get to learn about the
various concepts involved in data visualization, it is essential to appreciate
why it is so important to ‘look’ at the data from the perspective of plots and
graphs. To begin with, it is difficult for the human eye to decipher patterns
from raw numbers only. Sometimes, even the statistical information summarized
from the data may mislead you to wrong conclusions. Therefore, you should
visualize the data often to understand how different features are behaving.
Let’s understand this using one of the
very beautiful example.
The example we are going to observe is modified version of popular dataset called “Anscombe’s Quartet”. As explained in the linked article (Anscombe’s Quartet), the statistician Frances Anscombe constructed this example to counter the notion that “numerical calculations are exact, but graphs are rough.”
So we have sales data from four
different cities of the retail store. We have data for 11 different months,
from month of January to November. And for each of the city Mumbai, Bengaluru,
Hyderabad and Kolkata, we have for each month what discounting they have used
and the corresponding sales.
And we assist the dataset to
understand the overall sales from the data.
Can we predict the overall sales and
performances just by looking this values? This is rather a difficult task
because large amount of data.
So let’s take some help of basic summary statistic to get the a bit idea about the data set.
We can see the average and the standard deviation for
various branches. The standard deviation is used to observe the spread of the
data.
We observe that the average and
standard deviation for all the branches are commonly the same. So we can assume
that the summary statistics is same for all the cities.
As you can see clearly, the average
discount rate and sales, and their corresponding standard deviations, across
each of the branches are exactly the same. Does this imply that all the
branches have the same performance? The answer is No!!
Here the visualization comes into
picture. With the help of visualization we can analyze the trends in the data.
Here is the visual showing the discount rate for four different cities. The visual shown is a scatter plot. X-axis shows the “Discount rate” and the Y-axis shows the “Unit of Sales”.
So looking at this four graphs we can clearly say that the performance are not at all same.
So for Mumbai you observe the trend that with discount rate unit sales is increasing but not that monotonically. There are some variations above and below.
For Bengaluru, except that one exception everything is going great and you can draw a straight line through it.
For Hyderabad, we have very interesting pattern. Up to the discount rate of 11% the sales are increasing and after that the sales are going down.
For Kolkata, the branch as
not at all played with the discount so much. On most of the days the discount
rate was only 8% and only one particular day the discount rate was high. There
were not variations in the discount rate but the sales are different for
different months.
So we observed that instead of getting
same summary statistics the trends for each city were totally different. This
is the power of data visualization.
Each of the branches had actually
employed a different strategy to calculate its discount rate, and the sales
numbers were also quite different across all of them. It is difficult to draw
this type of insight and understand the difference between each of the branches
using raw numbers alone; therefore, you should utilize an appropriate
visualization technique to ‘look’ at the data.
- Jay Charole
- Mar, 11 2022