12. String Function In Pandas
If you are working on a dataset, there are scenarios where you need to perform string operations / manipulation.
String Function In Pandas
To demonstrate, consider this sample dataframe:
Output (DataFrame):
0
101
RaHul
Sharma
221B baker street, delhi
Product was good but DELIVERY was late
98765432
1
102
Priya
Singh
mg road, bengaluru
Packing was excellent!!
9988776655
2
103
Anil
Kapoor
sector 22, noida
Quality not same as shown
123456789012
3
104
sneha
Agarwal
park street, kolkata
Delay of 5 days. not satisfied
90807060
4
105
Rohan
Mehta
andheri east, mumbai
Product damaged. refund in 7 days?
9090909090
5
106
Meera
Verma
baner road, pune
Delivered early!! very happy
8008008008
6
107
amit
Kumar
gomti nagar, lucknow
Color was different from the website
9876543210987
7
108
SwaTi
Joshi
anna salai, chennai
Double payment deducted
88112
8
109
Ravi
Shukla
ring road, ahmedabad
Wrong item delivered
9999999999
9
110
Nidhi
Gupta
civil lines, jaipur
Refund amount incorrect by 150 rupees
989898989
To use string functions on columns, use the str accessor. Just like dt brings datetime functions, str brings string functions.
strip function
strip removes white spaces from the left and right side of the string.
Example:
Output (DataFrame):
0
101
RaHul
Sharma
221B baker street, delhi
Product was good but DELIVERY was late
98765432
1
102
Priya
Singh
mg road, bengaluru
Packing was excellent!!
9988776655
2
103
Anil
Kapoor
sector 22, noida
Quality not same as shown
123456789012
3
104
sneha
Agarwal
park street, kolkata
Delay of 5 days. not satisfied
90807060
4
105
Rohan
Mehta
andheri east, mumbai
Product damaged. refund in 7 days?
9090909090
5
106
Meera
Verma
baner road, pune
Delivered early!! very happy
8008008008
6
107
amit
Kumar
gomti nagar, lucknow
Color was different from the website
9876543210987
7
108
SwaTi
Joshi
anna salai, chennai
Double payment deducted
88112
8
109
Ravi
Shukla
ring road, ahmedabad
Wrong item delivered
9999999999
9
110
Nidhi
Gupta
civil lines, jaipur
Refund amount incorrect by 150 rupees
989898989
You can strip multiple columns:
Output (DataFrame) — same structure with trimmed whitespace.
lower / upper / swapcase / title / capitalize
These methods change case or capitalization.
Example sequence:
Output (DataFrame): addresses shown in lower case.
len function
Counts the number of characters in a value.
Output includes new column count with character lengths of FirstName.
For phone numbers, first cast to string, then measure length:
Output includes Phone Length column.
replace
Used to replace substrings or characters.
And removing punctuation from feedback:
Output: punctuation removed in Feedback, street replaced by st. in Address.
split(expand=True)
Split values by a delimiter.
This returns a Series of lists. To expand into separate columns:
Output (expanded):
0
221b baker st.
delhi
1
mg road
bengaluru
2
sector 22
noida
3
park st.
kolkata
4
andheri east
mumbai
5
baner road
pune
6
gomti nagar
lucknow
7
anna salai
chennai
8
ring road
ahmedabad
9
civil lines
jaipur
Assign to new columns:
Output includes Area and City columns.
contains()
Returns True if string contains the keyword (supports regex). Useful with loc to filter rows.
Examples:
Matches rows where Feedback contains "Refund" (case-insensitive).
Combine conditions:
Or use a regex alternation:
To require multiple keywords (AND):
Note: there's no shortcut operator inside a single contains call for logical AND — combine expressions with &.
startswith()
Checks if a string starts with the given keyword. There's no case parameter for startswith, so standardize case first if needed.
endswith()
Checks if a string ends with the given keyword. Also has no case parameter; standardize case first.
string concatenation
You can concatenate strings from columns:
Output includes full name column.
indexing
Access characters by position:
Adds area code column with first character of Address.
slicing
Slice substrings using Python slice notation:
Adds short name column.
You can combine operations to create more complex codes:
Output includes unique code column, e.g. RaH-101-ihled.
Assignments
STRING FUNCTION ASSIGNMENTS
⭐ Basic Cleaning
1. Whitespace Cleanup
Remove leading/trailing spaces from FirstName, LastName, Address, and Feedback.
2. Case Standardization
Convert FirstName to proper case (first letter capital).
Convert LastName to UPPERCASE.
Convert Address to lowercase.
Convert Feedback to swapcase().
⭐ Length & Validation Tasks
3. Character Count
Create a column NameLength = length of FirstName + LastName combined.
Create a column PhoneLength = number of digits in PhoneNumber.
4. Identify Invalid Phone Numbers
Return all rows where PhoneNumber length is not equal to 10.
Return all rows where PhoneNumber contains any non-numeric characters.
⭐ Replace Tasks
5. Address Cleaning
Replace “street” → “st.”
6. Remove Punctuation
Remove all
!,?,., and,from Feedback.
7. Mask Phone Number
Show only last 4 digits, mask the rest with
*. Hint : You can use function in pandas here
⭐ Split & Extract Tasks
8. Address Splitting
Split Address into:
Area
City
Create two new columns.
9. Extract City Initial
Create a column CityCode = first 3 letters of the city.
⭐ contains(), startswith(), endswith()
10. Filter on Feedback
Return rows where Feedback:
contains “refund” OR “damaged”
contains BOTH words “not” and “satisfied”
contains the word “delivered” but not “late”
11. Filter on Address
Show addresses starting with ‘park’.
Show addresses ending with ‘mumbai’.
Show addresses that contain a number.
⭐ Concatenation Tasks
12. Full Name Creation
Create a column:
FullName = FirstName + " " + LastName
13. Create a Customer Code
Format:
⭐ Indexing & Slicing
14. Extract Codes
Create AreaCode = first character of Address.
Create ShortName = first 3 characters of FirstName.
15. Reverse Manipulations
Reverse LastName.
Reverse full Address string.
Last updated