16. Matplotlib For Data Visualization
Matplotib is a library that we use for data visualization. We are going to cover following visualization in Matplotlib :
Scatter Plot
line chart - plot
bar chart
pie chart
histogram
stackplot
boxplot
Install Matplotlib
pip install matplotlib
Let's Understand Matplotlib first , with an example :

Note : From this graph we can see the lowest dip in sales was in march and the highest was in apr
1. Scatter Plot
A scatter plot is a type of graph that shows the relationship between two numerical variables.

Multiple Values in Scatter Plot

Scatter Plot is also used to find relationship between two variables / data

Note : Here we can see , If temperature increase , Tea Sales decreases. It means they have negative relationship.

Note : Here we can see , If temperature increase , Tea Sales decreases. It means they have negative relationship. But In case of Ice cream , Ice cream and Temp have positive relationship.
Let's See an example of Scatter in Data:
0
Foot Locker
2020-01-01
Northeast
New York
New York
Men's Street Footwear
50.0
1200
In-store
60000.0
1
Foot Locker
2020-01-02
Northeast
New York
New York
Men's Athletic Footwear
50.0
1000
In-store
50000.0
2
Foot Locker
2020-01-03
Northeast
New York
New York
Women's Street Footwear
40.0
1000
In-store
40000.0
3
Foot Locker
2020-01-04
Northeast
New York
New York
Women's Athletic Footwear
45.0
850
In-store
38250.0
4
Foot Locker
2020-01-05
Northeast
New York
New York
Men's Apparel
60.0
900
In-store
54000.0
...
...
...
...
...
...
...
...
...
...
...
9643
Foot Locker
2021-01-24
Northeast
New Hampshire
Manchester
Men's Apparel
50.0
64
Outlet
3200.0
9644
Foot Locker
2021-01-24
Northeast
New Hampshire
Manchester
Women's Apparel
41.0
105
Outlet
4305.0
9645
Foot Locker
2021-02-22
Northeast
New Hampshire
Manchester
Men's Street Footwear
41.0
184
Outlet
7544.0
9646
Foot Locker
2021-02-22
Northeast
New Hampshire
Manchester
Men's Athletic Footwear
42.0
70
Outlet
2940.0
9647
Foot Locker
2021-02-22
Northeast
New Hampshire
Manchester
Women's Street Footwear
29.0
83
Outlet
2407.0
9648 rows × 10 columns
Finding Relationship between Units Sold and total
Obviously the more units you sell, the more sale will happen. So Units Sold and total must have positive relationship. Let's check it through scatter plot

Conclusion
The scatter plot shows a strong positive relationship between Units Sold and Sales.
This means:
As units sold increase, sales also increase.
No major outliers are visible that would break the pattern.
In your scatter plot, you can see that many points are clustered (crowded) together, especially between:
Units Sold: 100 to 600
Sales: 5,000 to 40,000
What this crowd tells us:
Most of your data falls in this range. This means the majority of your products or days have sales in this normal band.
Another Example
0
1
Male
27
Software Engineer
6.1
6
42
6
Overweight
126/83
77
4200
NaN
1
2
Male
28
Doctor
6.2
6
60
8
Normal
125/80
75
10000
NaN
2
3
Male
28
Doctor
6.2
6
60
8
Normal
125/80
75
10000
NaN
3
4
Male
28
Sales Representative
5.9
4
30
8
Obese
140/90
85
3000
Sleep Apnea
4
5
Male
28
Sales Representative
5.9
4
30
8
Obese
140/90
85
3000
Sleep Apnea
...
...
...
...
...
...
...
...
...
...
...
...
...
...
369
370
Female
59
Nurse
8.1
9
75
3
Overweight
140/95
68
7000
Sleep Apnea
370
371
Female
59
Nurse
8.0
9
75
3
Overweight
140/95
68
7000
Sleep Apnea
371
372
Female
59
Nurse
8.1
9
75
3
Overweight
140/95
68
7000
Sleep Apnea
372
373
Female
59
Nurse
8.1
9
75
3
Overweight
140/95
68
7000
Sleep Apnea
373
374
Female
59
Nurse
8.1
9
75
3
Overweight
140/95
68
7000
Sleep Apnea
374 rows × 13 columns

Conclusion
When sleep duration is low (around 6 hours), the sleep quality is also lower (around 4–6).
As sleep duration increases to 7–8 hours, the sleep quality becomes better (7–8).
People who sleep 8+ hours generally show the highest sleep quality (8–9).
👉 In simple words:
The more you sleep (up to about 8 hours), the better your sleep quality becomes. There is a clear positive relationship between sleep duration and sleep quality.

Conclusion
Sleep Duration vs Quality of Sleep (Blue Dots):
When people sleep more (7–8+ hours) → their sleep quality is higher (7–9).
When sleep duration is less (5.8–6.5 hours) → sleep quality is lower (4–6).
More sleep = better sleep quality.
Sleep Duration vs Stress Level (Orange Dots)
When people sleep less (around 6 hours) → their stress level is higher (7–8).
As sleep increases to 7–8 hours, stress level drops to 4–5.
With 8+ hours of sleep, stress becomes lowest (3–4).
More sleep = lower stress.
Overall Meaning
Sleep quality increases as sleep duration increases.
Stress decreases as sleep duration increases.
In simple words:
Sleeping more makes your sleep better and your stress lower.
2. Line chart
A line chart connects data points with straight lines to show how something changes over time. In simple words , we use line charts to show how something changes over time, so we can understand patterns, trends, and comparisons easily.


Conclusion
Both brands rise in February, fall in March, and grow strongly in April.
Samsung consistently sells more than iPhone in all four months.
Let's Use Line Chart on Data
Find Monthly Sales in Adidas Sales
Your manager has asked you to find Monthly Sales trend for adidas sales. Here is how you will do it.
0
Foot Locker
2020-01-01
Northeast
New York
New York
Men's Street Footwear
50.0
1200
In-store
60000.0
1
1
Foot Locker
2020-01-02
Northeast
New York
New York
Men's Athletic Footwear
50.0
1000
In-store
50000.0
1
2
Foot Locker
2020-01-03
Northeast
New York
New York
Women's Street Footwear
40.0
1000
In-store
40000.0
1
3
Foot Locker
2020-01-04
Northeast
New York
New York
Women's Athletic Footwear
45.0
850
In-store
38250.0
1
4
Foot Locker
2020-01-05
Northeast
New York
New York
Men's Apparel
60.0
900
In-store
54000.0
1
...
...
...
...
...
...
...
...
...
...
...
...
9643
Foot Locker
2021-01-24
Northeast
New Hampshire
Manchester
Men's Apparel
50.0
64
Outlet
3200.0
1
9644
Foot Locker
2021-01-24
Northeast
New Hampshire
Manchester
Women's Apparel
41.0
105
Outlet
4305.0
1
9645
Foot Locker
2021-02-22
Northeast
New Hampshire
Manchester
Men's Street Footwear
41.0
184
Outlet
7544.0
2
9646
Foot Locker
2021-02-22
Northeast
New Hampshire
Manchester
Men's Athletic Footwear
42.0
70
Outlet
2940.0
2
9647
Foot Locker
2021-02-22
Northeast
New Hampshire
Manchester
Women's Street Footwear
29.0
83
Outlet
2407.0
2
9648 rows × 11 columns
0
1
9744767.0
1
2
8263853.0
2
3
7694984.0
3
4
9691420.0
4
5
10741720.0
5
6
9803147.0
6
7
12550419.0
7
8
12293226.0
8
9
10405584.0
9
10
8538758.0
10
11
9023440.0
11
12
11415332.0


Conclusion
The monthly revenue trend shows clear seasonal fluctuations throughout the year. Revenue declines during the first quarter, reaching its lowest point in March. This is followed by a strong upward trajectory from April to July, with July recording the highest revenue of the year. Although revenues remain relatively high in August, a noticeable decline occurs from September to October. The final quarter shows recovery, with December ending the year on a strong note.
Overall, the data indicates two major growth periods—April to July and November to December—suggesting potential seasonal demand cycles or successful mid-year and year-end business strategies.
3. Bar Chart
A bar chart is used when you want to compare categories.


Let's Use this on Data
Give me State wise sales on Adidas
0
Foot Locker
2020-01-01
Northeast
New York
New York
Men's Street Footwear
50.0
1200
In-store
60000.0
1
Foot Locker
2020-01-02
Northeast
New York
New York
Men's Athletic Footwear
50.0
1000
In-store
50000.0
2
Foot Locker
2020-01-03
Northeast
New York
New York
Women's Street Footwear
40.0
1000
In-store
40000.0
3
Foot Locker
2020-01-04
Northeast
New York
New York
Women's Athletic Footwear
45.0
850
In-store
38250.0
4
Foot Locker
2020-01-05
Northeast
New York
New York
Men's Apparel
60.0
900
In-store
54000.0
...
...
...
...
...
...
...
...
...
...
...
9643
Foot Locker
2021-01-24
Northeast
New Hampshire
Manchester
Men's Apparel
50.0
64
Outlet
3200.0
9644
Foot Locker
2021-01-24
Northeast
New Hampshire
Manchester
Women's Apparel
41.0
105
Outlet
4305.0
9645
Foot Locker
2021-02-22
Northeast
New Hampshire
Manchester
Men's Street Footwear
41.0
184
Outlet
7544.0
9646
Foot Locker
2021-02-22
Northeast
New Hampshire
Manchester
Men's Athletic Footwear
42.0
70
Outlet
2940.0
9647
Foot Locker
2021-02-22
Northeast
New Hampshire
Manchester
Women's Street Footwear
29.0
83
Outlet
2407.0
9648 rows × 10 columns
31
New York
8670464.0
4
California
8580508.0
8
Florida
7820589.0
42
Texas
6612371.0
39
South Carolina
3593112.0
17
Louisiana
3377031.0
46
Washington
3222093.0
45
Virginia
3074415.0
36
Oregon
3047049.0
27
Nevada
2981134.0
32
North Carolina
2936581.0
30
New Mexico
2824641.0
11
Idaho
2742753.0
10
Hawaii
2734457.0
9
Georgia
2708591.0
5
Colorado
2569036.0
41
Tennessee
2567190.0
0
Alabama
2513424.0
28
New Hampshire
2339267.0
21
Michigan
2287283.0
49
Wyoming
2282342.0
34
Ohio
2269283.0
2
Arizona
2254096.0
23
Mississippi
2218609.0
44
Vermont
2041598.0
25
Montana
1930761.0
1
Alaska
1810428.0
3
Arkansas
1802672.0
6
Connecticut
1646448.0
20
Massachusetts
1578435.0
35
Oklahoma
1512059.0
7
Delaware
1508537.0
37
Pennsylvania
1478794.0
43
Utah
1387620.0
47
West Virginia
1311160.0
16
Kentucky
1241148.0
15
Kansas
1225314.0
29
New Jersey
1220446.0
12
Illinois
1204063.0
38
Rhode Island
1202256.0
24
Missouri
1189515.0
18
Maine
1129728.0
13
Indiana
1084723.0
40
South Dakota
1041101.0
19
Maryland
951134.0
33
North Dakota
950930.0
48
Wisconsin
948894.0
14
Iowa
909811.0
22
Minnesota
903918.0
26
Nebraska
728838.0




Conclusion
States like Nebraska and Minnesota show the lowest sales, while California, Florida, and New York generate the highest revenue. This clearly highlights which states are underperforming and which are the strongest markets.
4. Pie Chart
Pie chart is used to show percentage or proportion of the whole part. Pie charts are best when you want to display:
Market share
Budget distribution
Population split
Sales contribution of each product
It shows what portion each category contributes to the total.




Let's understand this on data
percentage contribution of Sales Method in Adidas
0
Foot Locker
2020-01-01
Northeast
New York
New York
Men's Street Footwear
50.0
1200
In-store
60000.0
1
Foot Locker
2020-01-02
Northeast
New York
New York
Men's Athletic Footwear
50.0
1000
In-store
50000.0
2
Foot Locker
2020-01-03
Northeast
New York
New York
Women's Street Footwear
40.0
1000
In-store
40000.0
3
Foot Locker
2020-01-04
Northeast
New York
New York
Women's Athletic Footwear
45.0
850
In-store
38250.0
4
Foot Locker
2020-01-05
Northeast
New York
New York
Men's Apparel
60.0
900
In-store
54000.0
0
Online
4889
1
Outlet
3019
2
In-store
1740


5. Histogram
A histogram is a graph that shows the distribution of numerical data. It splits the data into ranges (called bins) and tells how many values fall into each bin.
Normal Distribution: mean = median = mode
Right-Skewed Distribution: mean > median > mode
Left-Skewed Distribution: mean < median < mode


Conclusion
The histogram of office_employee_ages is symmetrical and bell-shaped, indicating a normal distribution.
Most employees are middle-aged (around 45–50 years).
The mean and median are very close, which confirms the symmetry of the data.
The few employees at the extremes (30–35 and 65–70) represent the tails of the distribution.


Conclusion
The age distribution of Instagram users is right-skewed (positively skewed).
Most users are young (30–35 years).
The long tail on the right shows a few older users.
The mean is higher than the median, which is typical for right-skewed data.
For skewed distributions like this, the median better represents the “typical” user age than the mean.


Conclusion
The age distribution of Facebook users is left-skewed (negatively skewed).
Most users are older (60–70 years).
The long tail on the left indicates a smaller number of younger users.
The mean is lower than the median, which is typical for left-skewed data.
For skewed distributions like this, the median better represents the “typical” user age than the mean.
Let's use this on Data
0
1
Male
27
Software Engineer
6.1
6
42
6
Overweight
126/83
77
4200
NaN
1
2
Male
28
Doctor
6.2
6
60
8
Normal
125/80
75
10000
NaN
2
3
Male
28
Doctor
6.2
6
60
8
Normal
125/80
75
10000
NaN
3
4
Male
28
Sales Representative
5.9
4
30
8
Obese
140/90
85
3000
Sleep Apnea
4
5
Male
28
Sales Representative
5.9
4
30
8
Obese
140/90
85
3000
Sleep Apnea
...
...
...
...
...
...
...
...
...
...
...
...
...
...
369
370
Female
59
Nurse
8.1
9
75
3
Overweight
140/95
68
7000
Sleep Apnea
370
371
Female
59
Nurse
8.0
9
75
3
Overweight
140/95
68
7000
Sleep Apnea
371
372
Female
59
Nurse
8.1
9
75
3
Overweight
140/95
68
7000
Sleep Apnea
372
373
Female
59
Nurse
8.1
9
75
3
Overweight
140/95
68
7000
Sleep Apnea
373
374
Female
59
Nurse
8.1
9
75
3
Overweight
140/95
68
7000
Sleep Apnea
374 rows × 13 columns


There is not much difference between mean , median and mode , so we can use mean here if asked about average age also.
6. Stackplot
A stack plot is a type of plot in Python (using Matplotlib) that shows how multiple datasets contribute to a total over a period or sequence.
Each dataset is stacked on top of the previous one.
Useful for showing cumulative contributions.
X-axis usually represents time or categories, Y-axis is the value.


0
1
195.753521
280.944828
365.337931
271.117241
196.929577
273.577465
1
2
193.833333
249.443609
372.523077
251.832000
184.348837
245.654135
2
3
157.179104
247.606061
366.469697
239.742424
179.598540
241.400000
3
4
184.957447
253.514286
379.085714
280.305556
194.521739
250.543478
4
5
204.264286
264.740741
370.797101
283.751825
203.459259
242.644444
5
6
173.214876
244.524590
375.826446
282.462810
188.770492
225.737705
6
7
222.629921
293.589147
409.037879
300.060606
213.527559
250.661417
7
8
216.288732
348.652778
419.619718
296.514085
238.707143
291.552448
8
9
210.920290
306.503704
374.059259
286.521429
216.057971
272.739130
9
10
172.544776
247.956204
314.925373
245.179104
170.101449
199.868613
10
11
167.053030
230.879699
308.458647
235.369231
177.772727
197.444444
11
12
189.922481
269.848000
363.476562
259.492063
205.093750
228.349593


7. Box Plot


A. Scatter Plot – 10 Questions
Plot a scatter plot showing the relationship between Temperature vs Tea Sales.
Using the lists
temp = [10,20,30,40,50,60]ice = [20,50,80,90,100,300]Plot both in the same scatter plot with a legend.Create a scatter plot between Units Sold vs Total Sales from the Adidas dataset.
In the Adidas scatter plot, highlight the range where most points appear (use annotations).
From the Sleep dataset, plot Sleep Duration vs Quality of Sleep.
Plot Sleep Duration vs Stress Level using a scatter plot.
Plot:
month = ['jan','feb','mar','apr']sale = [100,200,50,400]Add a small written conclusion.Plot a scatter plot showing iPhone vs Samsung monthly sales (two series).
Add
grid=Trueto a scatter plot.Create a scatter plot using any two numerical columns from the Adidas dataset and write 1 line about the pattern.
B. Line Chart – 10 Questions
Plot a line chart for
month = ['jan','feb','mar','apr']isale = [100,200,50,400].Plot line charts for iPhone vs Samsung monthly sales with markers.
From Adidas data, create a line chart for Month vs Revenue.
Plot the Adidas Month 1 → Month 12 revenue trend using a line graph.
Add custom x-ticks (jan–dec) to your line chart.
Plot two lines on the same graph: Revenue 2020 vs Revenue 2021.
Use markers in a line chart and observe the improvement in readability.
Create a line chart with
grid=Falseand compare visually to one with grid.Compare revenue between two states across all 12 months.
Create a line chart showing the rolling average (moving trend) of Adidas revenue.
C. Bar Chart – 8 Questions
Create a bar chart of state-wise revenue in Adidas, sorted descending.
Make a horizontal bar chart for the same data.
Plot:
city = ['delhi','pune','agra','bangalore']isale = [100,200,50,400].Create a bar chart showing Male vs Female count in Sleep dataset.
Create a bar chart of Product-wise total units sold in Adidas.
Create a bar chart of Occupation vs Average Sleep Duration.
Make a grouped bar chart comparing iPhone vs Samsung sales (4 months).
Use a bar chart to compare categories (any categorical variable from Adidas).
D. Pie Chart – 5 Questions
Pie chart of revenue contribution of top 5 states (Adidas).
Pie chart of gender distribution in Sleep dataset.
Stylish pie chart using
quantity = [13.3, 2.2, 8.7, 5.6]use explode + shadow.Pie chart showing product category distribution.
Create any pie chart that represents parts of a whole using your own dataset values.
E. Histogram – 7 Questions
Histogram for Units Sold in Adidas.
Histogram for Sleep Duration.
Generate 100 random values (mean=50, std=10) → plot histogram.
Histogram for Daily Steps from Sleep dataset.
Create two histograms for same data using bins=5 and bins=20.
Create 3 lists that produce:
normal distribution
right skew
left skew and plot all 3 histograms.
Plot a histogram using 100 random integers (1–500).
F. Boxplot – 5 Questions
Boxplot of Units Sold to detect outliers.
Plot three boxplots together: Units Sold, Price per Unit, Total Sales.
Boxplot for Sleep Duration vs Quality of Sleep.
Boxplot showing state-wise revenue spread.
Boxplot comparing two products from Adidas.
G. Stackplot – 3 Questions
Create a stackplot using iPhone and Samsung month-wise sales.
Add labels + custom colors to a stackplot.
Create a stackplot using any two numeric lists from your datasets.
H. Python Lists – 2 Questions
Generate a list of 100 random integers (1–500) and plot a histogram.
Using:
temp = [...]tea = [...]ice = [...]Plot three scatter plots in one figure (subplot method).
Last updated