3.DataFrame and Series

Download Dataset

32KB

Two DataTypes in Pandas - DataFrame and Series

When working in pandas, you need to deal with only two datatypes:

DataFrame
Series

What is DataFrame

A DataFrame is just like a table (an Excel sheet). It has rows and columns. You can create your own DataFrame with the help of a dictionary also.

import pandas as pd

data = {
  'product': ['iphone', 'samsung', 'vivo', 'blackberry'],
  'price'  : [200, 300, 100, 50]
}
df = pd.DataFrame(data)
df

Out:

product

price

iphone

200

samsung

300

vivo

100

blackberry

Whenever you load an Excel or CSV file, it gives you a result as a DataFrame. For example:

df = pd.read_csv('retail_sales_dataset.csv')
df

Out (example):

Customer ID

Gender

Age

Product Category

Quantity

Price per Unit

CUST001

Male

Beauty

CUST002

Female

Clothing

500

CUST003

Male

Electronics

CUST004

Male

Clothing

500

CUST005

Male

Beauty

...

995

CUST996

Male

Clothing

996

CUST997

Male

Beauty

997

CUST998

Female

Beauty

998

CUST999

Female

Electronics

999

CUST1000

Male

Electronics

1000 rows × 6 columns

4 Basic Properties of DataFrame

A DataFrame has 4 properties. Each property is shown below with examples.

shape

Gives the number of rows and columns.

df.shape

Out:

(1000, 6)

index

Gives the row index.

df.index

Out:

RangeIndex(start=0, stop=1000, step=1)

columns

Gives all the column names.

df.columns

Out:

Index(['Customer ID', 'Gender', 'Age', 'Product Category', 'Quantity', 'Price per Unit'], dtype='object')

values

Gives the array of all values.

df.values

Out (example):

array([['CUST001', 'Male', 34, 'Beauty', 3, 50],
       ['CUST002', 'Female', 26, 'Clothing', 2, 500],
       ['CUST003', 'Male', 50, 'Electronics', 1, 30],
       ...,
       ['CUST998', 'Female', 23, 'Beauty', 4, 25],
       ['CUST999', 'Female', 36, 'Electronics', 3, 50],
       ['CUST1000', 'Male', 47, 'Electronics', 4, 30]], shape=(1000, 6), dtype=object)

4 Basic Functions of DataFrame

Common functions you will use with DataFrame are shown below. These are explained with examples in the stepper that follows:

head()

Returns the top n rows in the DataFrame. By default it returns 5 rows.

df.head()       # default 5 rows
df.head(7)      # first 7 rows

Out (example for df.head()):

Customer ID

Gender

Age

Product Category

Quantity

Price per Unit

CUST001

Male

Beauty

CUST002

Female

Clothing

500

CUST003

Male

Electronics

CUST004

Male

Clothing

500

CUST005

Male

Beauty

tail()

Returns the bottom n rows in the DataFrame. By default it returns 5 rows.

df.tail()       # default 5 rows
df.tail(3)      # last 3 rows

Out (example for df.tail()):

Customer ID

Gender

Age

Product Category

Quantity

Price per Unit

995

CUST996

Male

Clothing

996

CUST997

Male

Beauty

997

CUST998

Female

Beauty

998

CUST999

Female

Electronics

999

CUST1000

Male

Electronics

info()

Gives an overview of the DataFrame (dtypes, non-null counts, memory usage).

df.info()

Out (example):

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype
---  ------            --------------  -----
 0   Customer ID       1000 non-null   object
 1   Gender            1000 non-null   object
 2   Age               1000 non-null   int64
 3   Product Category  1000 non-null   object
 4   Quantity          1000 non-null   int64
 5   Price per Unit    1000 non-null   int64
dtypes: int64(3), object(3)
memory usage: 47.0+ KB

describe()

Gives descriptive statistics / 5-number summary for numerical columns.

df.describe()

Out (example):

Age

Quantity

Price per Unit

count

1000.00000

1000.000000

mean

41.39200

2.51400

179.890000

std

13.68143

1.13273

189.681356

min

18.00000

1.00000

25.000000

25%

29.00000

1.00000

30.000000

50%

42.00000

3.00000

50.000000

75%

53.00000

4.00000

300.000000

max

64.00000

4.00000

500.000000

Explanation for the Age column:

count : 1000 age records
mean : average age ≈ 41.39
min : youngest age = 18
max : oldest age = 64
25% / 50% / 75% : percentiles (part of the 5-number summary)

Note: std is the standard deviation (covered later).

to_excel() and to_csv()

Save a DataFrame to an Excel or CSV file. Use index=False if you don't want the index in the output file.

df.to_excel('Adidas.xlsx', index=False)
df.to_csv('Adidas.csv', index=False)

Assignments:

Load Adidas Dataset from required files.
Get the basic properties (shape , index , values , columns).
Get only number of rows in the dataset.
Get Top 20 rows.
Get Last 5 rows.
Get the basic information of dataframe.
Find the basic statistical summary of the dataframe.
Find the basic statistical summary of the dataframe with all columns.
Save the statistical summary in excel file and in csv file in your desktop.

Previous2.Loading Excel and CSV Data Next4. Series Operations In Pandas

Last updated 1 month ago

Good evening

hashtagTwo DataTypes in Pandas - DataFrame and Series

hashtagWhat is DataFrame

hashtag4 Basic Properties of DataFrame

hashtagshape

hashtagindex

hashtagcolumns

hashtagvalues

hashtaghead()

hashtagtail()

hashtaginfo()

hashtagdescribe()

hashtagto_excel() and to_csv()

hashtagAssignments:

Two DataTypes in Pandas - DataFrame and Series

What is DataFrame

4 Basic Properties of DataFrame

shape

index

columns

values

head()

tail()

info()

describe()

to_excel() and to_csv()

Assignments: