2.Loading Excel and CSV Data
Download this dataset :
First thing to understand before you start working in pandas is how to load your data. In this guide we show how to load Excel and CSV files into pandas.
Install dependencies
If you want to work with Excel files in pandas, you need an Excel reader. pandas commonly uses openpyxl for .xlsx files. Install pandas (and openpyxl if needed) from your terminal:
pip install pandas openpyxlLoad Excel file (using a full path)
Example — read an Excel file using an absolute path:
import pandas as pd
data = pd.read_excel(r"C:\Users\abhis\Desktop\Adidas US Sales Datasets.xlsx")
dataExample output (truncated):
0
Foot Locker
2020-01-01
Northeast
New York
New York
Men's Street Footwear
50.0
1200
In-store
1
Foot Locker
2020-01-02
Northeast
New York
New York
Men's Athletic Footwear
50.0
1000
In-store
...
...
...
...
...
...
...
...
...
...
9647
Foot Locker
2021-02-22
Northeast
New Hampshire
Manchester
Women's Street Footwear
29.0
83
Outlet
9648 rows × 9 columns
Load Excel file (without giving a path)
If you save the Excel file in your project directory (the current working directory), you can read it by filename only:
data = pd.read_excel('Adidas US Sales Datasets.xlsx')
data(Example output is the same as above — 9648 rows × 9 columns.)
skiprows
If the file has a few leading rows that are empty or contain metadata so pandas treats the wrong row as the header, use the skiprows parameter to skip n rows:
Example of problematic reading (first row treated as header):
data = pd.read_excel('Adidas US Sales Datasets.xlsx')
dataSolution — skip the first row so pandas reads the correct header:
data = pd.read_excel('Adidas US Sales Datasets.xlsx', skiprows=1)
dataIf there are n rows to skip, use skiprows=n.
No headers in Excel
If the Excel file has no header row, pandas will still take the first row as the header by default. Use header=None to tell pandas there are no headers and let it assign integer column names:
data = pd.read_excel('Adidas US Sales Datasets.xlsx', header=None)
dataThis produces numeric column names (0, 1, 2, ...). To provide meaningful column names, use the names parameter:
data = pd.read_excel(
'Adidas US Sales Datasets.xlsx',
header=None,
names=['Retailer','Date','Region','State','City','Product','Price','Units Sold','Sales Method']
)
dataMultiple sheets
To read a specific sheet use sheet_name:
retail_data = pd.read_excel('Adidas US Sales Datasets.xlsx', sheet_name='Retailer Details')
retail_dataExample output (truncated):
0
Foot Locker
1185732
1
Walmart
1185733
2
Sports Direct
1197831
3
West Gear
1197832
4
Kohl's
120771
How to read CSV files
What is a CSV file?
CSV stands for Comma-Separated Values.
In a CSV file, every field is separated by a delimiter (usually a comma).
CSV is a universal file format and can be opened in many tools.
Read a CSV with pandas:
data = pd.read_csv('retail_sales_dataset.csv')
dataExample output (truncated):
0
CUST001
Male
34
Beauty
3
50
1
CUST002
Female
26
Clothing
2
500
...
...
...
...
...
...
...
999
CUST1000
Male
47
Electronics
4
30
1000 rows × 6 columns
If your CSV uses a different delimiter (e.g., tab, semicolon), use the delimiter (or sep) parameter:
data = pd.read_csv('some_file.csv', delimiter=';') # or sep='\t' for tabs(End of guide.)
Assignments:
Install pandas
Load Adidas Dataset from desktop.
Load Adidas Dataset from your project.
Insert 2 empty rows in your excel file and load the data with real columns. (Use Skiprows)
Delete Header and load the data with headers.
Load Second Sheet from Adidas Dataset.
Load Retail Sales CSV Files
Last updated