This note is updated frequently without notice!

In this note, I use df as DataFrame, s as Series.

Libraries

import pandas as pd # import pandas package
import numpy as np

Import and have a look

df = pd.read_csv('filename.csv')
df.head() # read first 5 rows
df.tail() # last 5 rows
df.head(10) # first 10 rows

Get general infos

df.info() # show dtype of dataframe
df.describe() # numerical features
df.describe(include=['O']) # categorical features

df.shape # dataframe's shape
df.dtypes # type of each column

df.get_dtype_counts() # count the number of data types

Describe An example of using df.describe().

Get columns’ info

Get the list of columns,

df.columns
len(df.columns) # count the number of columns

Counting

Counting the number of elements of each class in df,

df.Classes.value_counts() # give number of each 0 and 1

The number of null values in df,

df.isnull().sum().sort_values(ascending=False)
df.isnull().sum()/df_train.isnull().count()*100 # find % of null values

Notice an error?

Everything on this site is published on Github. Just summit a suggested change or email me directly (don't forget to include the URL containing the bug), I will fix it.