Data Analysis and Visualization of Co2 Emission by Different Countries

Data Analysis and Visualization Project in Python

Rohit Kumar Thakur
5 min readSep 29, 2021
Photo by Kouji Tsuru on Unsplash

There is a rapid change in climate over the past few years. Everyone is talking about pollutions and its effects on our climate. But most of us are not doing anything about it. If we don’t get serious now, then it would be too late. Earth is our home so, let’s protect it.

In this article, We are going to do some analysis over Co2 emission by different countries across the world. I hope that this analysis and visualization is going to help young entrepreneurs to take down the global Co2 emission rate. So, let’s start some data analysis with a cup of coffee.

Attention all developers seeking to make social connections and establish themselves while earning passive income — look no further! I highly recommend ‘From Code to Connections’, a book that will guide you through the process. Don’t miss out, grab your copy now on Amazon worldwide or Amazon India! You can also go for Gumroad

Code With Analysis

You can perform the task in either Google Colab or Jupyter Notebook. The link to the dataset used in this project is given at the end of this article.

  • Import the following libraries
#for mathematical computationimport numpy as np
import pandas as pd
import scipy.stats as stats
#for data visualizationimport seaborn as sns
import matplotlib.pyplot as plt
import plotly
import plotly.express as px
from matplotlib.pyplot import figure
% matplotlib inline
  • Let’s load and take a sneak peek at the data. Download the dataset and add it to the path. After that render the first 5 data of the dataset.
df = pd.read_csv("/content/CO2Emission_LifeExp.csv",
encoding='latin-1')
df.head()

Run the cell, you will see something like this on-screen.

Co2 emission: Analysis and visualization
  • Get some more information about the data
#data info
df.info()
#Check missing values
df.isnull().sum()

Check out the null values in each column. We got lucky that there are no null values in our dataset.
After that, get more information about our dataset with the type of each column attributes.

  • Describe and Duplicacy of data
df[df.duplicated()].count()df.describe()
Co2 emission: Analysis and visualization
  • The country with the highest carbon emission
df_emissions = df.sort_values(by='CO2Emissions', ascending=False)
df_emissions.head(15)
Co2 emission: Analysis and visualization

China is topping the chart as expected. We are looking for the top 15 countries with the highest Co2 emission.

Take a look at the graphical view also.

px.bar(x='Country', y='CO2Emissions', data_frame=df_emissions.head(15), title="Top 15 Countries with highest Co2 Emissions")
Co2 emission: Analysis and visualization

The geographical distribution of this list is quite diverse. But as you can see that, Asian countries produce the most amount of Co2

Most Countries In The List Are Large Economies

Most Of These Countries Also Have The Largest Populations In The World

Most Of These Countries Have Population Life Expectancy On Par With The Global Average.

  • Countries With Highest Carbon Dioxide Emissions(Per Capita)
df_capita = df.sort_values(by='Percapita', ascending=False)
df_capita.head(15)
Co2 emission: Analysis and visualization

The above is the list of the top 15 countries with the highest carbon dioxide emission per capita. Qatar is leading the table followed by Montenegro and Kuwait. Let’s look at the graph

px.bar(x='Country', y='Percapita', data_frame=df_capita.head(15), title="Top 15 Countries with highest Co2 Emissions (per Capita)")
Co2 emission: Analysis and visualization

The geographical distribution of this list is somewhat diverse. But as you can see that, Middle East countries produce the most amount of Co2 per capita.

Most Of These Countries Are Rich And Low Populated.

All These Countries Have Population Life Expectancy Higher Than Global Average.

  • Correlation

Let’s find out the correlation between the columns. But first, convert the columns to numeric.

df[['CO2Emissions', 'YearlyChange', 'Percapita', 'Population', 'LifeExpectancy']] = df[['CO2Emissions', 'YearlyChange', 'Percapita', 'Population', 'LifeExpectancy']].apply(pd.to_numeric)f,ax = plt.subplots(figsize=(14,10))
sns.heatmap(df.corr(), annot=True, fmt=".1f", ax=ax)
plt.show()
Co2 emission: Analysis and visualization

As you can see that Life expectancy is negatively correlated to the yearly change of Co2 emission. It makes quite a sense.

  • Relation between Co2 emission and Life expectancy
px.line(x='LifeExpectancy', y='CO2Emissions', data_frame=df, title="Relation of Life Expectancy and Co2 emissions")
Co2 emission: Analysis and visualization

The relation between the two is not so dependent. Most of the life expectancy lies between 70 to 80 and the Population is below 2B.

  • Top countries with the highest life expectancy
df_life = df.groupby('Country').sum().sort_values('LifeExpectancy', ascending=False)[:20]
df_life = df_life.reset_index()
px.bar(x='Country', y ="LifeExpectancy", data_frame=df_life)
Co2 emission: Analysis and visualization

People of Hong Kong live the longest in the world followed by Japan and Macao. You can specify with them with geographical region too. The average life expectancy of the world is 72.69.

  • Relation between Population and Pollution
df_pop = df.groupby('CO2Emissions').sum().sort_values('Population', ascending=False)
df_pop = df_pop.reset_index()
px.line(x='CO2Emissions', y ="Population", data_frame=df_pop)
Co2 emission: Analysis and visualization

Well, That’s it.

Congrats, you analyzed the Co2 emission dataset. You can dig more on your own. Because you can do a lot with data. And the information you get is valuable.

Database and Full Github Source Code are here.

Hello, My Name is Rohit Kumar Thakur. I am open to freelancing. I build react native projects and currently working on Python Django. Feel free to contact me (freelance.rohit7@gmail.com)

--

--