16. Matplotlib For Data Visualization

Matplotib is a library that we use for data visualization. We are going to cover following visualization in Matplotlib :

  1. Scatter Plot

  2. line chart - plot

  3. bar chart

  4. pie chart

  5. histogram

  6. stackplot

  7. boxplot

Install Matplotlib

pip install matplotlib

Let's Understand Matplotlib first , with an example :

Note : From this graph we can see the lowest dip in sales was in march and the highest was in apr

1. Scatter Plot

A scatter plot is a type of graph that shows the relationship between two numerical variables.

Multiple Values in Scatter Plot

Scatter Plot is also used to find relationship between two variables / data

Note : Here we can see , If temperature increase , Tea Sales decreases. It means they have negative relationship.

Note : Here we can see , If temperature increase , Tea Sales decreases. It means they have negative relationship. But In case of Ice cream , Ice cream and Temp have positive relationship.

Let's See an example of Scatter in Data:

Retailer
Invoice Date
Region
State
City
Product
Price per Unit
Units Sold
Sales Method
total

0

Foot Locker

2020-01-01

Northeast

New York

New York

Men's Street Footwear

50.0

1200

In-store

60000.0

1

Foot Locker

2020-01-02

Northeast

New York

New York

Men's Athletic Footwear

50.0

1000

In-store

50000.0

2

Foot Locker

2020-01-03

Northeast

New York

New York

Women's Street Footwear

40.0

1000

In-store

40000.0

3

Foot Locker

2020-01-04

Northeast

New York

New York

Women's Athletic Footwear

45.0

850

In-store

38250.0

4

Foot Locker

2020-01-05

Northeast

New York

New York

Men's Apparel

60.0

900

In-store

54000.0

...

...

...

...

...

...

...

...

...

...

...

9643

Foot Locker

2021-01-24

Northeast

New Hampshire

Manchester

Men's Apparel

50.0

64

Outlet

3200.0

9644

Foot Locker

2021-01-24

Northeast

New Hampshire

Manchester

Women's Apparel

41.0

105

Outlet

4305.0

9645

Foot Locker

2021-02-22

Northeast

New Hampshire

Manchester

Men's Street Footwear

41.0

184

Outlet

7544.0

9646

Foot Locker

2021-02-22

Northeast

New Hampshire

Manchester

Men's Athletic Footwear

42.0

70

Outlet

2940.0

9647

Foot Locker

2021-02-22

Northeast

New Hampshire

Manchester

Women's Street Footwear

29.0

83

Outlet

2407.0

9648 rows × 10 columns

Finding Relationship between Units Sold and total

Obviously the more units you sell, the more sale will happen. So Units Sold and total must have positive relationship. Let's check it through scatter plot

Conclusion

The scatter plot shows a strong positive relationship between Units Sold and Sales.

This means:

  1. As units sold increase, sales also increase.

  2. No major outliers are visible that would break the pattern.

In your scatter plot, you can see that many points are clustered (crowded) together, especially between:

  1. Units Sold: 100 to 600

  2. Sales: 5,000 to 40,000

What this crowd tells us:

Most of your data falls in this range. This means the majority of your products or days have sales in this normal band.

Another Example

Person ID
Gender
Age
Occupation
Sleep Duration
Quality of Sleep
Physical Activity Level
Stress Level
BMI Category
Blood Pressure
Heart Rate
Daily Steps
Sleep Disorder

0

1

Male

27

Software Engineer

6.1

6

42

6

Overweight

126/83

77

4200

NaN

1

2

Male

28

Doctor

6.2

6

60

8

Normal

125/80

75

10000

NaN

2

3

Male

28

Doctor

6.2

6

60

8

Normal

125/80

75

10000

NaN

3

4

Male

28

Sales Representative

5.9

4

30

8

Obese

140/90

85

3000

Sleep Apnea

4

5

Male

28

Sales Representative

5.9

4

30

8

Obese

140/90

85

3000

Sleep Apnea

...

...

...

...

...

...

...

...

...

...

...

...

...

...

369

370

Female

59

Nurse

8.1

9

75

3

Overweight

140/95

68

7000

Sleep Apnea

370

371

Female

59

Nurse

8.0

9

75

3

Overweight

140/95

68

7000

Sleep Apnea

371

372

Female

59

Nurse

8.1

9

75

3

Overweight

140/95

68

7000

Sleep Apnea

372

373

Female

59

Nurse

8.1

9

75

3

Overweight

140/95

68

7000

Sleep Apnea

373

374

Female

59

Nurse

8.1

9

75

3

Overweight

140/95

68

7000

Sleep Apnea

374 rows × 13 columns

Conclusion

When sleep duration is low (around 6 hours), the sleep quality is also lower (around 4–6).

As sleep duration increases to 7–8 hours, the sleep quality becomes better (7–8).

People who sleep 8+ hours generally show the highest sleep quality (8–9).

👉 In simple words:

The more you sleep (up to about 8 hours), the better your sleep quality becomes. There is a clear positive relationship between sleep duration and sleep quality.

Conclusion

Sleep Duration vs Quality of Sleep (Blue Dots):

  1. When people sleep more (7–8+ hours) → their sleep quality is higher (7–9).

  2. When sleep duration is less (5.8–6.5 hours) → sleep quality is lower (4–6).

  3. More sleep = better sleep quality.

Sleep Duration vs Stress Level (Orange Dots)

  1. When people sleep less (around 6 hours) → their stress level is higher (7–8).

  2. As sleep increases to 7–8 hours, stress level drops to 4–5.

  3. With 8+ hours of sleep, stress becomes lowest (3–4).

  4. More sleep = lower stress.

Overall Meaning

  1. Sleep quality increases as sleep duration increases.

  2. Stress decreases as sleep duration increases.

In simple words:

Sleeping more makes your sleep better and your stress lower.

2. Line chart

A line chart connects data points with straight lines to show how something changes over time. In simple words , we use line charts to show how something changes over time, so we can understand patterns, trends, and comparisons easily.

Conclusion

  1. Both brands rise in February, fall in March, and grow strongly in April.

  2. Samsung consistently sells more than iPhone in all four months.

Let's Use Line Chart on Data

Find Monthly Sales in Adidas Sales

Your manager has asked you to find Monthly Sales trend for adidas sales. Here is how you will do it.

Retailer
Invoice Date
Region
State
City
Product
Price per Unit
Units Sold
Sales Method
total
Month

0

Foot Locker

2020-01-01

Northeast

New York

New York

Men's Street Footwear

50.0

1200

In-store

60000.0

1

1

Foot Locker

2020-01-02

Northeast

New York

New York

Men's Athletic Footwear

50.0

1000

In-store

50000.0

1

2

Foot Locker

2020-01-03

Northeast

New York

New York

Women's Street Footwear

40.0

1000

In-store

40000.0

1

3

Foot Locker

2020-01-04

Northeast

New York

New York

Women's Athletic Footwear

45.0

850

In-store

38250.0

1

4

Foot Locker

2020-01-05

Northeast

New York

New York

Men's Apparel

60.0

900

In-store

54000.0

1

...

...

...

...

...

...

...

...

...

...

...

...

9643

Foot Locker

2021-01-24

Northeast

New Hampshire

Manchester

Men's Apparel

50.0

64

Outlet

3200.0

1

9644

Foot Locker

2021-01-24

Northeast

New Hampshire

Manchester

Women's Apparel

41.0

105

Outlet

4305.0

1

9645

Foot Locker

2021-02-22

Northeast

New Hampshire

Manchester

Men's Street Footwear

41.0

184

Outlet

7544.0

2

9646

Foot Locker

2021-02-22

Northeast

New Hampshire

Manchester

Men's Athletic Footwear

42.0

70

Outlet

2940.0

2

9647

Foot Locker

2021-02-22

Northeast

New Hampshire

Manchester

Women's Street Footwear

29.0

83

Outlet

2407.0

2

9648 rows × 11 columns

Month
revenue

0

1

9744767.0

1

2

8263853.0

2

3

7694984.0

3

4

9691420.0

4

5

10741720.0

5

6

9803147.0

6

7

12550419.0

7

8

12293226.0

8

9

10405584.0

9

10

8538758.0

10

11

9023440.0

11

12

11415332.0

png

Conclusion

The monthly revenue trend shows clear seasonal fluctuations throughout the year. Revenue declines during the first quarter, reaching its lowest point in March. This is followed by a strong upward trajectory from April to July, with July recording the highest revenue of the year. Although revenues remain relatively high in August, a noticeable decline occurs from September to October. The final quarter shows recovery, with December ending the year on a strong note.

Overall, the data indicates two major growth periods—April to July and November to December—suggesting potential seasonal demand cycles or successful mid-year and year-end business strategies.

3. Bar Chart

A bar chart is used when you want to compare categories.

png

Let's Use this on Data

Give me State wise sales on Adidas

Retailer
Invoice Date
Region
State
City
Product
Price per Unit
Units Sold
Sales Method
total

0

Foot Locker

2020-01-01

Northeast

New York

New York

Men's Street Footwear

50.0

1200

In-store

60000.0

1

Foot Locker

2020-01-02

Northeast

New York

New York

Men's Athletic Footwear

50.0

1000

In-store

50000.0

2

Foot Locker

2020-01-03

Northeast

New York

New York

Women's Street Footwear

40.0

1000

In-store

40000.0

3

Foot Locker

2020-01-04

Northeast

New York

New York

Women's Athletic Footwear

45.0

850

In-store

38250.0

4

Foot Locker

2020-01-05

Northeast

New York

New York

Men's Apparel

60.0

900

In-store

54000.0

...

...

...

...

...

...

...

...

...

...

...

9643

Foot Locker

2021-01-24

Northeast

New Hampshire

Manchester

Men's Apparel

50.0

64

Outlet

3200.0

9644

Foot Locker

2021-01-24

Northeast

New Hampshire

Manchester

Women's Apparel

41.0

105

Outlet

4305.0

9645

Foot Locker

2021-02-22

Northeast

New Hampshire

Manchester

Men's Street Footwear

41.0

184

Outlet

7544.0

9646

Foot Locker

2021-02-22

Northeast

New Hampshire

Manchester

Men's Athletic Footwear

42.0

70

Outlet

2940.0

9647

Foot Locker

2021-02-22

Northeast

New Hampshire

Manchester

Women's Street Footwear

29.0

83

Outlet

2407.0

9648 rows × 10 columns

State
revenue

31

New York

8670464.0

4

California

8580508.0

8

Florida

7820589.0

42

Texas

6612371.0

39

South Carolina

3593112.0

17

Louisiana

3377031.0

46

Washington

3222093.0

45

Virginia

3074415.0

36

Oregon

3047049.0

27

Nevada

2981134.0

32

North Carolina

2936581.0

30

New Mexico

2824641.0

11

Idaho

2742753.0

10

Hawaii

2734457.0

9

Georgia

2708591.0

5

Colorado

2569036.0

41

Tennessee

2567190.0

0

Alabama

2513424.0

28

New Hampshire

2339267.0

21

Michigan

2287283.0

49

Wyoming

2282342.0

34

Ohio

2269283.0

2

Arizona

2254096.0

23

Mississippi

2218609.0

44

Vermont

2041598.0

25

Montana

1930761.0

1

Alaska

1810428.0

3

Arkansas

1802672.0

6

Connecticut

1646448.0

20

Massachusetts

1578435.0

35

Oklahoma

1512059.0

7

Delaware

1508537.0

37

Pennsylvania

1478794.0

43

Utah

1387620.0

47

West Virginia

1311160.0

16

Kentucky

1241148.0

15

Kansas

1225314.0

29

New Jersey

1220446.0

12

Illinois

1204063.0

38

Rhode Island

1202256.0

24

Missouri

1189515.0

18

Maine

1129728.0

13

Indiana

1084723.0

40

South Dakota

1041101.0

19

Maryland

951134.0

33

North Dakota

950930.0

48

Wisconsin

948894.0

14

Iowa

909811.0

22

Minnesota

903918.0

26

Nebraska

728838.0

png
png

Conclusion

States like Nebraska and Minnesota show the lowest sales, while California, Florida, and New York generate the highest revenue. This clearly highlights which states are underperforming and which are the strongest markets.

4. Pie Chart

Pie chart is used to show percentage or proportion of the whole part. Pie charts are best when you want to display:

  1. Market share

  2. Budget distribution

  3. Population split

  4. Sales contribution of each product

It shows what portion each category contributes to the total.

png
png

Let's understand this on data

percentage contribution of Sales Method in Adidas

Retailer
Invoice Date
Region
State
City
Product
Price per Unit
Units Sold
Sales Method
total

0

Foot Locker

2020-01-01

Northeast

New York

New York

Men's Street Footwear

50.0

1200

In-store

60000.0

1

Foot Locker

2020-01-02

Northeast

New York

New York

Men's Athletic Footwear

50.0

1000

In-store

50000.0

2

Foot Locker

2020-01-03

Northeast

New York

New York

Women's Street Footwear

40.0

1000

In-store

40000.0

3

Foot Locker

2020-01-04

Northeast

New York

New York

Women's Athletic Footwear

45.0

850

In-store

38250.0

4

Foot Locker

2020-01-05

Northeast

New York

New York

Men's Apparel

60.0

900

In-store

54000.0

Sales Method
count

0

Online

4889

1

Outlet

3019

2

In-store

1740

png

5. Histogram

A histogram is a graph that shows the distribution of numerical data. It splits the data into ranges (called bins) and tells how many values fall into each bin.

  1. Normal Distribution: mean = median = mode

  2. Right-Skewed Distribution: mean > median > mode

  3. Left-Skewed Distribution: mean < median < mode

png

Conclusion

  1. The histogram of office_employee_ages is symmetrical and bell-shaped, indicating a normal distribution.

  2. Most employees are middle-aged (around 45–50 years).

  3. The mean and median are very close, which confirms the symmetry of the data.

  4. The few employees at the extremes (30–35 and 65–70) represent the tails of the distribution.

png

Conclusion

  1. The age distribution of Instagram users is right-skewed (positively skewed).

  2. Most users are young (30–35 years).

  3. The long tail on the right shows a few older users.

  4. The mean is higher than the median, which is typical for right-skewed data.

  5. For skewed distributions like this, the median better represents the “typical” user age than the mean.

png

Conclusion

  1. The age distribution of Facebook users is left-skewed (negatively skewed).

  2. Most users are older (60–70 years).

  3. The long tail on the left indicates a smaller number of younger users.

  4. The mean is lower than the median, which is typical for left-skewed data.

  5. For skewed distributions like this, the median better represents the “typical” user age than the mean.

Let's use this on Data

Person ID
Gender
Age
Occupation
Sleep Duration
Quality of Sleep
Physical Activity Level
Stress Level
BMI Category
Blood Pressure
Heart Rate
Daily Steps
Sleep Disorder

0

1

Male

27

Software Engineer

6.1

6

42

6

Overweight

126/83

77

4200

NaN

1

2

Male

28

Doctor

6.2

6

60

8

Normal

125/80

75

10000

NaN

2

3

Male

28

Doctor

6.2

6

60

8

Normal

125/80

75

10000

NaN

3

4

Male

28

Sales Representative

5.9

4

30

8

Obese

140/90

85

3000

Sleep Apnea

4

5

Male

28

Sales Representative

5.9

4

30

8

Obese

140/90

85

3000

Sleep Apnea

...

...

...

...

...

...

...

...

...

...

...

...

...

...

369

370

Female

59

Nurse

8.1

9

75

3

Overweight

140/95

68

7000

Sleep Apnea

370

371

Female

59

Nurse

8.0

9

75

3

Overweight

140/95

68

7000

Sleep Apnea

371

372

Female

59

Nurse

8.1

9

75

3

Overweight

140/95

68

7000

Sleep Apnea

372

373

Female

59

Nurse

8.1

9

75

3

Overweight

140/95

68

7000

Sleep Apnea

373

374

Female

59

Nurse

8.1

9

75

3

Overweight

140/95

68

7000

Sleep Apnea

374 rows × 13 columns

png

There is not much difference between mean , median and mode , so we can use mean here if asked about average age also.

6. Stackplot

A stack plot is a type of plot in Python (using Matplotlib) that shows how multiple datasets contribute to a total over a period or sequence.

  1. Each dataset is stacked on top of the previous one.

  2. Useful for showing cumulative contributions.

  3. X-axis usually represents time or categories, Y-axis is the value.

png
Product
Month
Men's Apparel
Men's Athletic Footwear
Men's Street Footwear
Women's Apparel
Women's Athletic Footwear
Women's Street Footwear

0

1

195.753521

280.944828

365.337931

271.117241

196.929577

273.577465

1

2

193.833333

249.443609

372.523077

251.832000

184.348837

245.654135

2

3

157.179104

247.606061

366.469697

239.742424

179.598540

241.400000

3

4

184.957447

253.514286

379.085714

280.305556

194.521739

250.543478

4

5

204.264286

264.740741

370.797101

283.751825

203.459259

242.644444

5

6

173.214876

244.524590

375.826446

282.462810

188.770492

225.737705

6

7

222.629921

293.589147

409.037879

300.060606

213.527559

250.661417

7

8

216.288732

348.652778

419.619718

296.514085

238.707143

291.552448

8

9

210.920290

306.503704

374.059259

286.521429

216.057971

272.739130

9

10

172.544776

247.956204

314.925373

245.179104

170.101449

199.868613

10

11

167.053030

230.879699

308.458647

235.369231

177.772727

197.444444

11

12

189.922481

269.848000

363.476562

259.492063

205.093750

228.349593

png

7. Box Plot

png

A. Scatter Plot – 10 Questions

  1. Plot a scatter plot showing the relationship between Temperature vs Tea Sales.

  2. Using the lists temp = [10,20,30,40,50,60] ice = [20,50,80,90,100,300] Plot both in the same scatter plot with a legend.

  3. Create a scatter plot between Units Sold vs Total Sales from the Adidas dataset.

  4. In the Adidas scatter plot, highlight the range where most points appear (use annotations).

  5. From the Sleep dataset, plot Sleep Duration vs Quality of Sleep.

  6. Plot Sleep Duration vs Stress Level using a scatter plot.

  7. Plot: month = ['jan','feb','mar','apr'] sale = [100,200,50,400] Add a small written conclusion.

  8. Plot a scatter plot showing iPhone vs Samsung monthly sales (two series).

  9. Add grid=True to a scatter plot.

  10. Create a scatter plot using any two numerical columns from the Adidas dataset and write 1 line about the pattern.


B. Line Chart – 10 Questions

  1. Plot a line chart for month = ['jan','feb','mar','apr'] isale = [100,200,50,400].

  2. Plot line charts for iPhone vs Samsung monthly sales with markers.

  3. From Adidas data, create a line chart for Month vs Revenue.

  4. Plot the Adidas Month 1 → Month 12 revenue trend using a line graph.

  5. Add custom x-ticks (jan–dec) to your line chart.

  6. Plot two lines on the same graph: Revenue 2020 vs Revenue 2021.

  7. Use markers in a line chart and observe the improvement in readability.

  8. Create a line chart with grid=False and compare visually to one with grid.

  9. Compare revenue between two states across all 12 months.

  10. Create a line chart showing the rolling average (moving trend) of Adidas revenue.


C. Bar Chart – 8 Questions

  1. Create a bar chart of state-wise revenue in Adidas, sorted descending.

  2. Make a horizontal bar chart for the same data.

  3. Plot: city = ['delhi','pune','agra','bangalore'] isale = [100,200,50,400].

  4. Create a bar chart showing Male vs Female count in Sleep dataset.

  5. Create a bar chart of Product-wise total units sold in Adidas.

  6. Create a bar chart of Occupation vs Average Sleep Duration.

  7. Make a grouped bar chart comparing iPhone vs Samsung sales (4 months).

  8. Use a bar chart to compare categories (any categorical variable from Adidas).


D. Pie Chart – 5 Questions

  1. Pie chart of revenue contribution of top 5 states (Adidas).

  2. Pie chart of gender distribution in Sleep dataset.

  3. Stylish pie chart using quantity = [13.3, 2.2, 8.7, 5.6] use explode + shadow.

  4. Pie chart showing product category distribution.

  5. Create any pie chart that represents parts of a whole using your own dataset values.


E. Histogram – 7 Questions

  1. Histogram for Units Sold in Adidas.

  2. Histogram for Sleep Duration.

  3. Generate 100 random values (mean=50, std=10) → plot histogram.

  4. Histogram for Daily Steps from Sleep dataset.

  5. Create two histograms for same data using bins=5 and bins=20.

  6. Create 3 lists that produce:

    • normal distribution

    • right skew

    • left skew and plot all 3 histograms.

  7. Plot a histogram using 100 random integers (1–500).


F. Boxplot – 5 Questions

  1. Boxplot of Units Sold to detect outliers.

  2. Plot three boxplots together: Units Sold, Price per Unit, Total Sales.

  3. Boxplot for Sleep Duration vs Quality of Sleep.

  4. Boxplot showing state-wise revenue spread.

  5. Boxplot comparing two products from Adidas.


G. Stackplot – 3 Questions

  1. Create a stackplot using iPhone and Samsung month-wise sales.

  2. Add labels + custom colors to a stackplot.

  3. Create a stackplot using any two numeric lists from your datasets.


H. Python Lists – 2 Questions

  1. Generate a list of 100 random integers (1–500) and plot a histogram.

  2. Using:

    • temp = [...]

    • tea = [...]

    • ice = [...] Plot three scatter plots in one figure (subplot method).

Last updated