This note is updated frequently without notice!

Sometimes, we wanna couple multiple dataframes together. In this note, I use df as DataFrame, s as Series.

Libraries

import pandas as pd
import numpy as np

Coupling dfs with merge()

There are 4 types of merging, like in SQL.

  • Inner: only includes elements that appear in both dataframes with a common key.
  • Outer: includes all data from both dataframes.
  • Left: includes all of the rows from the “left” dataframe along with any rows from the “right” dataframe with a common key; the result retains all columns from both of the original dataframes.
  • Right: includes all of the rows from the “right” dataframe along with any rows from the “left” dataframe with a common key; the result retains all columns from both of the original dataframes.
# on the same column name
pd.merge(left=df1, right=df2, how='left', on='Country', suffixes=('_df1', '_df2'))

# on different columns,
pd.merge(left=df1, right=df2, how='left', left_on='Country', right_on="Region" suffixes=('_df1', '_df2'))

Concatenate dfs with concat()

Combining multiples dfs with the same columns (axis=0),[ref]

pd.concat([head_2015, head_2016], ignore_index=True) # default: axis=0
# ignore_index=True prevent duplicating indexes 

Combining multiples dfs with the same indexes (rows) (axis=1),

pd.concat([head_2015, head_2016], axis=1)

Notice an error?

Everything on this site is published on Github. Just summit a suggested change or email me directly (don't forget to include the URL containing the bug), I will fix it.