This note is updated frequently without notice!
In this note, I use df
as DataFrame
, s
as Series
.
Libraries
import pandas as pd # import pandas package
import numpy as np
Import and have a look
df = pd.read_csv('filename.csv')
df.head() # read first 5 rows
df.tail() # last 5 rows
df.head(10) # first 10 rows
Get general infos
df.info() # show dtype of dataframe
df.describe() # numerical features
df.describe(include=['O']) # categorical features
df.shape # dataframe's shape
df.dtypes # type of each column
df.get_dtype_counts() # count the number of data types
An example of using
df.describe()
.
Get columns’ info
Get the list of columns,
df.columns
len(df.columns) # count the number of columns
Counting
Counting the number of elements of each class in df,
df.Classes.value_counts() # give number of each 0 and 1
The number of null
values in df,
df.isnull().sum().sort_values(ascending=False)
df.isnull().sum()/df_train.isnull().count()*100 # find % of null values