Data Analysis using R
Data Analytics

Data Analysis using R

r-markdown.knit

Data Analysis using R

As the IPL 2022 fever is taking over the country; it is a great time for analyzing historic IPL matches data and answering questions from it.

For this analysis we are using IPL data set from Kaggle which can be found here. This file contains historical records of IPL matches from 2008-2021.

Import Data

Let’s import library needed for this job:

# Tidyverse is a package containing libraries essential for analysis.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Importing csv file

ipl_data <- read_csv("IPL_Matches_2008_2021.csv") # dunction part of readr library included in tidyverse
## Rows: 876 Columns: 20
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr  (17): City, Season, MatchNumber, Team1, Team2, Venue, TossWinner, TossD...
## dbl   (2): ID, Margin
## date  (1): Date
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Understanding Data

colnames(ipl_data) # column names
##  [1] "ID"              "City"            "Date"            "Season"         
##  [5] "MatchNumber"     "Team1"           "Team2"           "Venue"          
##  [9] "TossWinner"      "TossDecision"    "SuperOver"       "WinningTeam"    
## [13] "WonBy"           "Margin"          "method"          "Player_of_Match"
## [17] "Team1Players"    "Team2Players"    "Umpire1"         "Umpire2"
head(ipl_data) # view data
## # A tibble: 6 x 20
##        ID City      Date       Season MatchNumber Team1   Team2 Venue TossWinner
##     <dbl> <chr>     <date>     <chr>  <chr>       <chr>   <chr> <chr> <chr>     
## 1 1254117 Dubai     2021-10-15 2021   Final       Chenna~ Kolk~ Duba~ Kolkata K~
## 2 1254116 Sharjah   2021-10-13 2021   Qualifier 2 Delhi ~ Kolk~ Shar~ Kolkata K~
## 3 1254115 Sharjah   2021-10-11 2021   Eliminator  Royal ~ Kolk~ Shar~ Royal Cha~
## 4 1254114 Dubai     2021-10-10 2021   Qualifier 1 Delhi ~ Chen~ Duba~ Chennai S~
## 5 1254088 Abu Dhabi 2021-10-08 2021   55          Mumbai~ Sunr~ Zaye~ Mumbai In~
## 6 1254101 Dubai     2021-10-08 2021   56          Delhi ~ Roya~ Duba~ Royal Cha~
## # ... with 11 more variables: TossDecision <chr>, SuperOver <chr>,
## #   WinningTeam <chr>, WonBy <chr>, Margin <dbl>, method <chr>,
## #   Player_of_Match <chr>, Team1Players <chr>, Team2Players <chr>,
## #   Umpire1 <chr>, Umpire2 <chr>

Data Cleaning

Removing White spaces if any:

ipl_data <- as.data.frame(apply(ipl_data, 2, trimws))

If we look at data; in season column there are some entries 2020/2021 for 2021 season. In order to analyze data we need to standardized data in a particular format. WE need to replace such entries with appropriate year.

ipl_data$Season <- gsub("2020/21","2020",ipl_data$Season)
ipl_data$Season <- gsub("2007/08","2008",ipl_data$Season)
ipl_data$Season <- gsub("2009/10","2010",ipl_data$Season)
ipl_data$WinningTeam <- gsub("Rising Pune Supergiantss","Rising Pune Supergiants",ipl_data$WinningTeam)

Visualizing Data

Q1. Which team has won most matches in ipl history?

ggplot(subset(ipl_data, !is.na(WinningTeam))) + geom_bar(mapping = aes(y=WinningTeam), fill = "LIGHTBLUE")+labs(title="Most Wins") + theme_classic()

Most wins in ipl history are for Mumbai Indians and second highest for Chennai super kings.

Q2. Which team has won highest ipl trophies?

#filter data to get only final matches
final <- filter(ipl_data, MatchNumber == "Final")
#visualizing data
ggplot(final) + geom_bar(mapping = aes(y=WinningTeam), fill = "LIGHTGREEN")+labs(title="Most Trophies By Team") + theme_classic()

The answer is Mumbai Indians.

Q3. Which match venue has been the favorite?

#visualize
ggplot(subset(ipl_data, !is.na(City))) + 
geom_bar(mapping = aes(y=City), fill = "RED")+labs(title="Favorite Venue in IPL") + theme_classic()

Mumbai is the financial capital of India as well as a great city and guess what it is also the most favorite for IPL matches.

Q4. Which player got the player of the match title?

#filtering data
top <- ipl_data %>% count(Player_of_Match, sort=TRUE) %>% top_n(10)
#visualizing
ggplot(top) + geom_col(mapping = aes(y=Player_of_Match, x=n), fill="ORANGE") + theme_classic()

It is my favorite AB de Villiers. :))

  • sajal gupta
  • Apr, 01 2022

Add New Comments

Please login in order to make a comment.

Recent Comments

Be the first to start engaging with the bis blog.