Data Science

# Importance of Data Visualization

Before we get to learn about the various concepts involved in data visualization, it is essential to appreciate why it is so important to ‘look’ at the data from the perspective of plots and graphs. To begin with, it is difficult for the human eye to decipher patterns from raw numbers only. Sometimes, even the statistical information summarized from the data may mislead you to wrong conclusions. Therefore, you should visualize the data often to understand how different features are behaving.

Let’s understand this using one of the very beautiful example.

The example we are going to observe is modified version of popular dataset called “Anscombe’s Quartet”. As explained in the linked article (Anscombe’s Quartet), the statistician Frances Anscombe constructed this example to counter the notion that “numerical calculations are exact, but graphs are rough.”

So we have sales data from four different cities of the retail store. We have data for 11 different months, from month of January to November. And for each of the city Mumbai, Bengaluru, Hyderabad and Kolkata, we have for each month what discounting they have used and the corresponding sales.

And we assist the dataset to understand the overall sales from the data.

Can we predict the overall sales and performances just by looking this values? This is rather a difficult task because large amount of data.

So let’s take some help of basic summary statistic to get the a bit idea about the data set.

We can see the average and the standard deviation for various branches. The standard deviation is used to observe the spread of the data.

We observe that the average and standard deviation for all the branches are commonly the same. So we can assume that the summary statistics is same for all the cities.

As you can see clearly, the average discount rate and sales, and their corresponding standard deviations, across each of the branches are exactly the same. Does this imply that all the branches have the same performance? The answer is No!!

Here the visualization comes into picture. With the help of visualization we can analyze the trends in the data.

Here is the visual showing the discount rate for four different cities. The visual shown is a scatter plot. X-axis shows the “Discount rate” and the Y-axis shows the “Unit of Sales”.

So looking at this four graphs we can clearly say that the performance are not at all same.

So for Mumbai you observe the trend that with discount rate unit sales is increasing but not that monotonically. There are some variations above and below.

For Bengaluru, except that one exception everything is going great and you can draw a straight line through it.

For Hyderabad, we have very interesting pattern. Up to the discount rate of 11% the sales are increasing and after that the sales are going down.

For Kolkata, the branch as not at all played with the discount so much. On most of the days the discount rate was only 8% and only one particular day the discount rate was high. There were not variations in the discount rate but the sales are different for different months.

So we observed that instead of getting same summary statistics the trends for each city were totally different. This is the power of data visualization.

Each of the branches had actually employed a different strategy to calculate its discount rate, and the sales numbers were also quite different across all of them. It is difficult to draw this type of insight and understand the difference between each of the branches using raw numbers alone; therefore, you should utilize an appropriate visualization technique to ‘look’ at the data.

• Jay Charole
• Mar, 11 2022