2. Data Visualization
Data visualization transforms complex data into intuitive, visual insights, making it easier to understand patterns, trends, and relationships.
Let’s discuss the use of different charts:
A: Bar Chart: It is used to compare different categories. Each bar represents a category, and the length of the bar corresponds to the value of that category.
B: Line Chart: Line charts are ideal for showing trends over time. Each point on the line represents a data value at a specific time.
C: Pie Chart: Pie charts show proportions of a whole. Each slice represents a category’s contribution to the total.
D: Histogram: Histograms display the distribution of a dataset. They show the frequency of data points within certain ranges
E: Box Plots: Show the spread of the data and detect outliers.
Here is we can see outlier detected and treated also
def outlier_detection_treatment(col):
q1= df[col].quantile(0.25)
q3= df[col].quantile(0.75)
iqr= q3-q1
lf= q1- 1.5*iqr
uf= q3+ 1.5*iqr
outliers= df[(df[col]uf)]
print("Percentage of outliers in", col ,outliers.shape[0]*100/df.shape[0])
fig, ax= plt.subplots(1,2, figsize=(6,4))
ax[0].boxplot(df[col])
ax[1].hist(df[col])
plt.suptitle(col)
plt.show()
df.loc[(df[col]uf), col]= df[col].median()
F: Scatter Plot: Scatter plots are used to examine relationships between two variables or Scatter plots are used in order to determine whether two measures are correlated. Each point represents an observation.
G: Heat Map: A Heat map chart provides a graphical summary of information by representing a set of data through variations in colors. The visualized datasets may differ in hue, shade, or intensity so that users and data analysts can more easily read and understand how the values vary across time.
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt