Indian Cars Data Analysis and Visualization

Data Science Project for Beginners

Rohit Kumar Thakur
Python in Plain English
6 min readOct 1, 2021

--

Indian Cars: Data Analysis and Visualization

Everyone has a different perspective about Cars. Some go for luxury. Some go for vintage cars. Some prefer cars based on their economic condition. Based on all these factors, how many car models of different companies are available for Indians?

Let’s perform an analysis over that. In this article, I am going to do some data analysis of different car companies, which sell cars in India. Let’s start with a cup of coffee.

Attention all developers seeking to make social connections and establish themselves while earning passive income — look no further! I highly recommend ‘From Code to Connections’, a book that will guide you through the process. Don’t miss out, grab your copy now on Amazon worldwide or Amazon India! You can also go for Gumroad

Code With Analysis

You can perform the task in either Google Colab or Jupyter Notebook. The link to the dataset used in this project is given at the end of this article.

  • Import the following libraries
#for mathematical computationimport numpy as np
import pandas as pd
import scipy.stats as stats#for data visualizationimport seaborn as sns
import matplotlib.pyplot as plt
import plotly
import plotly.express as px
from matplotlib.pyplot import figure% matplotlib inline
  • Let’s load and take a sneak peek at the data. Download the dataset and add it to the path. After that render the first 5 data of the dataset.
df = pd.read_csv("/content/cars_ds_final.csv", encoding='latin-1')
df.head()
Indian Cars: Data Analysis and Visualization

Run the cell, you will see something like the above image on your work screen.

  • Get some more information about the data
df.describe()
df.info()
df[df.isnull()].count()
df[df.duplicated()].count()

Run the above lines of codes in a separate cell and You will get some information on the type of data.

df.describe() will tell you about the mean, median, mode, and much more about the numeric data value of the dataset

df.info() will tell you about the attribute of the data

df.isnull() count will count all the null values present in the dataset.

df.duplicated() count will display all the duplicate data present in the dataset.

  • Cleaning of data
df=df.fillna('')
df=df.replace(' ', '')
df['Ex-Showroom_Price']=df['Ex-Showroom_Price'].str.replace(',','')
df['Ex-Showroom_Price']=df['Ex-Showroom_Price'].str.replace('Rs.','')
df['Displacement']=df['Displacement'].str.replace('cc','')

We only want to display the numeric value of Ex-Showroom Price and Displacement of the car. I also tried to clean the Power Column of the dataset. But it is a bit hectic because the data is quite mixed with value. You will notice it when you go through the Power Column.

Now, the Displacement column and Ex-Showroom price Column are fully numeric.

Insert the following code in the next cell and run the cell.

df.describe()

Two separate columns of Ex-Showroom Price and Displacement will appear. Each column will show you the mean, median, and mode of the relevant data.

  • Correlation
df[['Cylinders', 'Valves_Per_Cylinder', 'Doors', 'Seating_Capacity', 'Number_of_Airbags', 'Ex-Showroom_Price', 'Displacement']] = df[['Cylinders', 'Valves_Per_Cylinder', 'Doors', 'Seating_Capacity', 'Number_of_Airbags', 'Ex-Showroom_Price', 'Displacement']].apply(pd.to_numeric)f,ax = plt.subplots(figsize=(14,10))
sns.heatmap(df.corr(), annot=True, fmt=".2f", ax=ax)
plt.show()
Indian Cars: Data Analysis and Visualization

Ex-Showroom price is positively correlated to Displacement.

Ex-Showroom Price is Positively Correlated to the number of Cylinders. This means, more the number of cylinders, more the ex-showroom price.

The more the number of cylinders in a car, the more will be its displacement. Generally speaking, the higher an engine’s displacement the more power it can create.

The number of doors is highly negatively correlated with Displacement. That makes sense, right?

You can analyze more like this.

  • Brand-wise data

If you want to take a look at company-wise data. You can do it with just 1 line of code. Let’s suppose, I choose all the cars of TATA in the Indian Market.

df[df.Make =='Tata'].tail()

I used the tail() function to look at the last 5 data.

Indian Cars: Data Analysis and Visualization
  • Brands with the most number of cars in the Indian market
fig = plt.figure(figsize = (10,10))
ax = fig.subplots()
df.Make.value_counts().plot(ax=ax, kind='pie')
ax.set_ylabel("")
ax.set_title("Top Car Making Companies in India")
plt.show()
Indian Cars: Data Analysis and Visualization

Maruti Suzuki has more car variants than any other company in India.

The Top 5 companies with more than car variants in India are Maruti Suzuki, Hyundai, Mahindra, Tata, and Toyota.

Sports car variants are very low

  • Cars by car body Type
plt.figure(figsize=(16,7))
sns.countplot(data=df, y='Body_Type',alpha=.6,color='blue')
plt.title('Cars by car body type',fontsize=20)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.xlabel('')
plt.ylabel('');
Indian Cars: Data Analysis and Visualization

Indian market is favourable for SUVs, sedans, and Hatchback.

  • Graph of Body Type and Ex-Showroom Price
PriceByType = df.groupby('Body_Type').sum().sort_values('Ex-Showroom_Price', ascending=False)
PriceByType = PriceByType.reset_index()
px.bar(x='Body_Type', y ="Ex-Showroom_Price", data_frame=PriceByType)
Indian Cars: Data Analysis and Visualization

If we some up all the SUVs Ex-Showroom price present in the Dataset then it will be nearly 2B INR

Sports cars a minimal spike in the graph

  • Cars count by Engine Fuel Type
fig = plt.figure(figsize = (10,10))
ax = fig.subplots()
df.Fuel_Type.value_counts().plot(ax=ax, kind='pie')
ax.set_ylabel("")
ax.set_title("Cars Count by Engine Fuel Type")
plt.show()
Indian Cars: Data Analysis and Visualization

Almost 90% of Indian cars run on Petrol or Diesel. This is Scary if we see it from an Environment point of view.

This data is going to change because electric vehicles have arrived in India.

  • Relation Between Price and Displacement
plt.figure(figsize=(10,8))
sns.scatterplot(data=df, x='Displacement', y='Ex-Showroom_Price',hue='Body_Type',palette='viridis',alpha=.89, s=120 );
plt.xticks(fontsize=13);
plt.yticks(fontsize=13)
plt.xlabel('power',fontsize=14)
plt.ylabel('price',fontsize=14)
plt.title('Relation between Displacement and price',fontsize=20);
Indian Cars: Data Analysis and Visualization

This data is self-explanatory. The price and power of the sports car are the highest.

  • Plot pairwise relationships
sns.pairplot(df,vars=[ 'Displacement', 'Ex-Showroom_Price'], hue= 'Fuel_Type', palette=sns.color_palette('magma'),diag_kind='kde',height=2, aspect=1.8);
Indian Cars: Data Analysis and Visualization

Pair plot visualization comes in handy when you want to go for Exploratory data analysis (“EDA”). Pair plot visualizes given data to find the relationship between them where the variables can be continuous or categorical.

  • 3D graph of Displacement, Price, and Fuel Tank
fig = px.scatter_3d(df, x='Displacement', z='Ex-Showroom_Price', y='Fuel_Type',color='Make',width=800,height=750)
fig.update_layout(showlegend=True)
fig.show();
Indian Cars: Data Analysis and Visualization

This looks quite cool because the data is aligned by their fuel type.

Well, That’s it.

Congrats, you analyzed the Co2 emission dataset. You can dig more on your own. Because you can do a lot with data. And the information you get is valuable.

Database and Full Github Source Code are here.

Hello, My Name is Rohit Kumar Thakur. I am open to freelancing. I build react native projects and currently working on Python Django. Feel free to contact me at (freelance.rohit7@gmail.com)

More content at plainenglish.io

--

--