IPL Data Analysis (2008–2020) using Python

Data Analysis of Indian Premier League using pandas, NumPy, Seaborn, and Matplotlib library.

Rohit Kumar Thakur
7 min readAug 8, 2021

Hello, Data Scientists!

We all know that we live in an information age, where data plays a key role. If you own the data, you own everything. But what happens after you get the data?

Well, it depends on what kind of data you get. You might have some kind of data on your hands that you have to analyse to get valuable information. Like if you are working in Zomato, then you have to do data analysis on the data you have. If you are working in some advertisement company then you have to perform data analysis there, too. By analyzing their data, you may provide some valuable information and strategy to the company. Enough thesis here.

IPL Data Analysis

Now, we all watch cricket generally and we all know the Indian premier league (IPL) is the biggest cricket league in the world. Let’s perform the data analysis of IPL with the data of IPL matches from 2008 to 2020. Grab a cup of coffee and let’s begin the hack.

Here is the video tutorial of this article:

Analysis of IPL Data

We will go through these main steps for this project:

  1. Import libraries
  2. Load the data
  3. Analyse the data

Dataset and Github source code are as given below:

Attention all developers seeking to make social connections and establish themselves while earning passive income — look no further! I highly recommend ‘From Code to Connections’, a book that will guide you through the process. Don’t miss out, grab your copy now on Amazon worldwide or Amazon India! You can also go for Gumroad

Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Import these libraries to your Jupyter Notebook or Google Colab as we are going to use them in our code.

Load the data

match=pd.read_csv('D:\Data Science\IPL\Dataset\matches.csv')
delivery=pd.read_csv('D:\Data Science\IPL\Dataset\deliveries.csv')

Load the data using the above code in your Jupyter Notebook. For Google Colab you have to upload the dataset to your drive and then import it.

Analyse the Data

Now take a look at the data we are working on:

match.head(5)
delivery.head(5)

When you run the shell then in the match section you will see the first match back in 2008 was played between KKR and RCB. KKR has won the match at M Chinnaswamy Stadium and the Player of the Match was BB McCullum. The match result was decided by runs.

The same kind of analysis is present in the ball delivery section too. If you want to look at the top 5 bottom data of the table and then you have to run the program match.tail(5) in the Jupyter Notebook or Google Colab cell.

More Information about Matches and Ball Deliveries between 2008–2020

match.info() #816
delivery.info() #193468

List of the Participating Teams

all_teams = match['team1'].tolist() + match['team2'].tolist()
all_teams = list(set(all_teams))
all_teams

You will get the list of teams that played between the period 2008 to 2020. If you are a pro IPL fan then you will see some old team names on the list which are not playing these days but they contributed some valuable information in the IPL history.

Number of Matches per Venue

sns.countplot('venue', data=match)
plt.xticks(rotation='vertical')
Number of matches per venue in IPl(2008–2020)

As you see that Eden Gardens is the fan-favourite ground of IPL, nearly 80 matches have been hosted there.

Matches Played by Each Team

x = match['team1'].value_counts()
y = match['team2'].value_counts()
(x+y).plot(kind='barh')
Match played by each team in IPL(2008–2020)

We count the value of each team playing in column one and add to the count of each team from team two to get the desired output. For example, if CSK played 90 times from team one and 85 times from team 2 then the total of175 matches are shown in the graph. You can see that Mumbai Indians played the highest number of matches in the IPL.

Matches Won by Each Team

x=pd.DataFrame({"Winner":match['winner']}).value_counts()
print(x)

When you run this cell you will see that Mumbai Indians win the highest number of matches followed by CSK and other teams. Now if you want to plot this result in graph form then run this program in the next cell.

sns.countplot('winner', data=match)
plt.xticks(rotation='vertical')
Match won by each team in IPL(2008–2020)

Top 5 Players with the Highest Number of Man of the Match Awards

If you are a team management official and these players go under the hammer then you must have to keep eye on these players as these players have the highest number of Man of the Match awards.

Let’s check out how to find this:

temp_data=match['player_of_match'].value_counts().head()
print(temp_data)
sns.barplot(x=temp_data.index,y=temp_data.values,data=match)
plt.title("Top 5 MoM")
plt.xticks(rotation=90)
plt.xlabel("Match Count")
plt.ylabel("Player")
plt.show()
Players with the highest number of MoM in IPL(2008–2020)

Is your favourite player present in the above list?

The Top Batsman in the IPL

For this, we have to find out the player with the highest number of runs. To find out this, we have to sum up the batsman’s run from the delivery dataset and the batsman who scored that run. It’s simple logic, right?

Demonstrated below:

top_batsman=delivery.groupby('batsman')['batsman_runs'].agg('sum').reset_index().sort_values('batsman_runs', ascending=False).head(10)top_batsman.set_index('batsman', inplace=True)
top_batsman.plot(kind='bar')

We grouped the top 10 batsmen from the delivery dataset and summed up their runs. After this, we plot this information into a graph.

The top batsman in IPL(2008–2020)

King Kohli is at the top followed by Suresh Raina and other batsmen.

The Bowler Who Has Given the Highest Number of Runs

delivery.groupby('bowler')['total_runs'].agg('sum').reset_index().sort_values('total_runs', ascending=False).head(10)

For this, we grouped the top 10 bowlers who have given runs on his delivery in IPL matches and summed up that run for the final outlet.

The Bowler with Team-wise Performance

Let’s suppose you are playing against CSK and you have to find out which bowler’s performance was good in the previous years against this team. To find out the team-wise performance analysis of a bowler, you have to run the following program in your Notebook cell:

mask=delivery['bowler']=='PP Chawla'
delivery[mask].groupby('batting_team')['total_runs'].agg('sum').plot(kind='bar')

We are taking the example of PP Chawla. This bowler has given the highest number of runs in the IPL history till 2020. We summed up the total runs given by PP Chawla to the opponent team.

PP Chawla's bowling performance against IPL teams.

It’s clear that if you have PP Chawla in your team then don’t let him play against MI, CSK, RCB, RR, and DC.

Over-wise Batting Performance of Each Team in the IPL (2008–2020)

delivery6=delivery[mask]
delivery6=delivery6[['batting_team','over','batsman_runs']]
x=delivery6.pivot_table(values='batsman_runs', index='batting_team', columns='over', aggfunc='count')
sns.heatmap(x, cmap='summer')

For this, we are using a pivot table and then count the over-wise run of batsmen of the batting team. Then convert the data into a heatmap as given below:

Over wise batting performance of each team in IPL(2008–2020)

As you can see, if you are playing against MI or CSK, then you have to play with your best bowling attack line-up from the first over. MI’s batsmen are silent in the second and third over, after that they go on rampage mode against their opponent. The same goes for CSK and RCB too. This data is not only helpful from the bowling team’s perspective but also the batting team. If you are a team manager and you see using this data that your team is not performing well in the death overs then you probably should focus on buying a good finisher in the next auction. As you see in the above heatmap, most of the teams are lagging at the end of the map, except CSK and MI.

I think that’s why MI and CSK are the two most successful franchises in the IPL.

Dismissal Kind

sns.countplot('dismissal_kind', data=delivery)
plt.xticks(rotation='vertical')
Dismissal kind in IPL(2008–2020)

Now, if you want to know that how many runs Virat Kohli scored when he faced Jasprit Bumrah, use the following code:

mask=delivery['bowler']=='JJ Bumrah'
mask2=delivery['batsman']=='V Kohli'
delivery[mask].groupby('batsman')['batsman_runs'].agg('count').sort_values(ascending=False)['V Kohli']

Sum of the run when the bowler is Bumrah and the batsman is V Kohli. You will get the output.

Now, tell me in the comments — how to find out the top 10 players with the highest number of sixes in IPL? Logic is also accepted. If you need help, here is the full Github Code of this project. You can explore further.

The dataset used in this project is here.

Well, that’s it. Thank you for reading. If this article is informative then make sure to clap and share it with your community.

More content at plainenglish.io

--

--