In this post
This note is updated frequently without notice!
In this note, I use df
as DataFrame
, s
as Series
.
Libraries
import pandas as pd # import pandas package
import numpy as np
Other tasks
Deal with columns
Removing or Keep some
Removing columns,
df.drop('New', axis=1, inplace=True) # drop column 'New'
df.drop(['col1', 'col2'], axis=1, inplace=True)
Only keep some,
kept_cols = ['col1', 'col2', ...]
df = df[kept_cols]
Rename columns
In this part, we are going to use below dataframe df
.
Name | Ages | Marks | Place | |
---|---|---|---|---|
0 | John | 10 | 8 | Ben Tre |
1 | Thi | 20 | 9 | Paris |
# implicitly
df.columns = ['Surname', 'Years', 'Grade', 'Location']
# explicitly
df.rename(columns={
'Name': 'Surname',
'Ages': 'Years',
...
}, inplace=True)
Make index
Check if a column has unique values (so that it can be an index)
df['col'].is_unique # True if yes
Transform an index to column to a normal column,
df.reset_index(inplace=True)
Make a column be an index,[ref]
df.set_index('column')
df.set_index(['col1', 'col2'])
Deal with NaN
Drop if NaN
# Drop any rows which have any nans
df.dropna()
# Drop columns that have any nans
df.dropna(axis=1)
# Only drop columns which have at least 90% non-NaNs
df.dropna(thresh=int(df.shape[0] * .9), axis=1)
Fill NaN
with others
Check other methods of fillna
here.
# Fill NaN with ' '
df['col'] = df['col'].fillna(' ')
# Fill NaN with 99
df['col'] = df['col'].fillna(99)
# Fill NaN with the mean of the column
df['col'] = df['col'].fillna(df['col'].mean())
Do with conditions
np.where(if_this_condition_is_true, do_this, else_this)
df['new_column'] = np.where(df[i] > 10, 'foo', 'bar) # example