Exploring the Impact of EDA on Machine Learning | Upalabdhi Mohapatra | Sep 2024

SeniorTechInfo
2 Min Read

2. Data Visualization

Data visualization transforms complex data into intuitive, visual insights, making it easier to understand patterns, trends, and relationships.

Let’s discuss the use of different charts:

A: Bar Chart: It is used to compare different categories. Each bar represents a category, and the length of the bar corresponds to the value of that category.

Two categories of Gender

B: Line Chart: Line charts are ideal for showing trends over time. Each point on the line represents a data value at a specific time.

Line chart for sales and month in the year 2016–17

C: Pie Chart: Pie charts show proportions of a whole. Each slice represents a category’s contribution to the total.

D: Histogram: Histograms display the distribution of a dataset. They show the frequency of data points within certain ranges

E: Box Plots: Show the spread of the data and detect outliers.

Here is we can see outlier detected and treated also


def outlier_detection_treatment(col):
q1= df[col].quantile(0.25)
q3= df[col].quantile(0.75)
iqr= q3-q1
lf= q1- 1.5*iqr
uf= q3+ 1.5*iqr
outliers= df[(df[col]uf)]
print("Percentage of outliers in", col ,outliers.shape[0]*100/df.shape[0])
fig, ax= plt.subplots(1,2, figsize=(6,4))
ax[0].boxplot(df[col])
ax[1].hist(df[col])
plt.suptitle(col)
plt.show()
df.loc[(df[col]uf), col]= df[col].median()
Left one is Box Plot and Right one is Histogram

F: Scatter Plot: Scatter plots are used to examine relationships between two variables or Scatter plots are used in order to determine whether two measures are correlated. Each point represents an observation.

Strength of correlation of the two measures

G: Heat Map: A Heat map chart provides a graphical summary of information by representing a set of data through variations in colors. The visualized datasets may differ in hue, shade, or intensity so that users and data analysts can more easily read and understand how the values vary across time.

import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
Seaborn Heatmap
Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *