Data Analysis using R
Data Analysis using R
As the IPL 2022 fever is taking over the country; it is a great time for analyzing historic IPL matches data and answering questions from it.
For this analysis we are using IPL data set from Kaggle which can be found here. This file contains historical records of IPL matches from 2008-2021.
Import Data
Let’s import library needed for this job:
# Tidyverse is a package containing libraries essential for analysis.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Importing csv file
ipl_data <- read_csv("IPL_Matches_2008_2021.csv") # dunction part of readr library included in tidyverse
## Rows: 876 Columns: 20
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (17): City, Season, MatchNumber, Team1, Team2, Venue, TossWinner, TossD...
## dbl (2): ID, Margin
## date (1): Date
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Understanding Data
colnames(ipl_data) # column names
## [1] "ID" "City" "Date" "Season"
## [5] "MatchNumber" "Team1" "Team2" "Venue"
## [9] "TossWinner" "TossDecision" "SuperOver" "WinningTeam"
## [13] "WonBy" "Margin" "method" "Player_of_Match"
## [17] "Team1Players" "Team2Players" "Umpire1" "Umpire2"
head(ipl_data) # view data
## # A tibble: 6 x 20
## ID City Date Season MatchNumber Team1 Team2 Venue TossWinner
## <dbl> <chr> <date> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1254117 Dubai 2021-10-15 2021 Final Chenna~ Kolk~ Duba~ Kolkata K~
## 2 1254116 Sharjah 2021-10-13 2021 Qualifier 2 Delhi ~ Kolk~ Shar~ Kolkata K~
## 3 1254115 Sharjah 2021-10-11 2021 Eliminator Royal ~ Kolk~ Shar~ Royal Cha~
## 4 1254114 Dubai 2021-10-10 2021 Qualifier 1 Delhi ~ Chen~ Duba~ Chennai S~
## 5 1254088 Abu Dhabi 2021-10-08 2021 55 Mumbai~ Sunr~ Zaye~ Mumbai In~
## 6 1254101 Dubai 2021-10-08 2021 56 Delhi ~ Roya~ Duba~ Royal Cha~
## # ... with 11 more variables: TossDecision <chr>, SuperOver <chr>,
## # WinningTeam <chr>, WonBy <chr>, Margin <dbl>, method <chr>,
## # Player_of_Match <chr>, Team1Players <chr>, Team2Players <chr>,
## # Umpire1 <chr>, Umpire2 <chr>
Data Cleaning
Removing White spaces if any:
ipl_data <- as.data.frame(apply(ipl_data, 2, trimws))
If we look at data; in season column there are some entries 2020/2021 for 2021 season. In order to analyze data we need to standardized data in a particular format. WE need to replace such entries with appropriate year.
ipl_data$Season <- gsub("2020/21","2020",ipl_data$Season)
ipl_data$Season <- gsub("2007/08","2008",ipl_data$Season)
ipl_data$Season <- gsub("2009/10","2010",ipl_data$Season)
ipl_data$WinningTeam <- gsub("Rising Pune Supergiantss","Rising Pune Supergiants",ipl_data$WinningTeam)
Visualizing Data
Q1. Which team has won most matches in ipl history?
ggplot(subset(ipl_data, !is.na(WinningTeam))) + geom_bar(mapping = aes(y=WinningTeam), fill = "LIGHTBLUE")+labs(title="Most Wins") + theme_classic()
Most wins in ipl history are for Mumbai Indians and second highest for Chennai super kings.
Q2. Which team has won highest ipl trophies?
#filter data to get only final matches
final <- filter(ipl_data, MatchNumber == "Final")
#visualizing data
ggplot(final) + geom_bar(mapping = aes(y=WinningTeam), fill = "LIGHTGREEN")+labs(title="Most Trophies By Team") + theme_classic()
The answer is Mumbai Indians.
Q3. Which match venue has been the favorite?
#visualize
ggplot(subset(ipl_data, !is.na(City))) +
geom_bar(mapping = aes(y=City), fill = "RED")+labs(title="Favorite Venue in IPL") + theme_classic()
Mumbai is the financial capital of India as well as a great city and guess what it is also the most favorite for IPL matches.
Q4. Which player got the player of the match title?
#filtering data
top <- ipl_data %>% count(Player_of_Match, sort=TRUE) %>% top_n(10)
#visualizing
ggplot(top) + geom_col(mapping = aes(y=Player_of_Match, x=n), fill="ORANGE") + theme_classic()
It is my favorite AB de Villiers. :))
- sajal gupta
- Apr, 01 2022