When you merge two or more DataFrames in pandas, and you want to have an indicator column to tell the information on the source of each row, then you can make use of the parameter indicator=True along with the pandas.merge() function.
Example: Pandas: Merge Indicator (Left, Right, and Both)
import pandas as pd
data_city_NYC = {
'City': ['NYC', 'NYC', 'NYC', 'NYC'],
'Date': ['2023-07-01', '2023-07-02', '2023-07-03', '2023-07-04'],
'Temp_NYC': [25, 30, 27, 0]
}
data_city_Chicago = {
'City': ['Chicago', 'Chicago', 'Chicago', 'Chicago'],
'Date': ['2023-07-01', '2023-07-02', '2023-07-03', '2023-07-06'],
'Temp_Chicago': [28, 32, 29, 0]
}
df_city_NYC = pd.DataFrame(data_city_NYC).set_index('Date')
df_city_Chicago = pd.DataFrame(data_city_Chicago).set_index('Date')
print("DataFrame: NYC City")
print(df_city_NYC)
print("\nDataFrame: Chicago City")
print(df_city_Chicago)
# Merge both DataFrames on the common column 'Date' and add an indicator
merged_cities_both_df = pd.merge(df_city_NYC, df_city_Chicago, on='Date', how='outer', suffixes=('_NYC', '_Chicago'), indicator=True)
# Merge left DataFrames on the common column 'Date' and add an indicator
merged_cities_left_df = pd.merge(df_city_NYC, df_city_Chicago, on='Date', how='left', suffixes=('_NYC', '_Chicago'), indicator=True)
# Merge right DataFrames on the common column 'Date' and add an indicator
merged_cities_right_df = pd.merge(df_city_NYC, df_city_Chicago, on='Date', how='right', suffixes=('_NYC', '_Chicago'), indicator=True)
print("\nMerged DataFrame Data (Merge Both):")
print(merged_cities_both_df)
print("\nMerged DataFrame Data (Merge Left):")
print(merged_cities_left_df)
print("\nMerged DataFrame Data (Merge Right):")
print(merged_cities_right_df)
Output:
DataFrame: NYC City
City Temp_NYC
Date
2023-07-01 NYC 25
2023-07-02 NYC 30
2023-07-03 NYC 27
2023-07-04 NYC 0
DataFrame: Chicago City
City Temp_Chicago
Date
2023-07-01 Chicago 28
2023-07-02 Chicago 32
2023-07-03 Chicago 29
2023-07-06 Chicago 0
Merged DataFrame Data (Merge Both):
City_NYC Temp_NYC City_Chicago Temp_Chicago _merge
Date
2023-07-01 NYC 25.0 Chicago 28.0 both
2023-07-02 NYC 30.0 Chicago 32.0 both
2023-07-03 NYC 27.0 Chicago 29.0 both
2023-07-04 NYC 0.0 NaN NaN left_only
2023-07-06 NaN NaN Chicago 0.0 right_only
Merged DataFrame Data (Merge Left):
City_NYC Temp_NYC City_Chicago Temp_Chicago _merge
Date
2023-07-01 NYC 25 Chicago 28.0 both
2023-07-02 NYC 30 Chicago 32.0 both
2023-07-03 NYC 27 Chicago 29.0 both
2023-07-04 NYC 0 NaN NaN left_only
Merged DataFrame Data (Merge Right):
City_NYC Temp_NYC City_Chicago Temp_Chicago _merge
Date
2023-07-01 NYC 25.0 Chicago 28 both
2023-07-02 NYC 30.0 Chicago 32 both
2023-07-03 NYC 27.0 Chicago 29 both
2023-07-06 NaN NaN Chicago 0 right_only
Merge Indicator Values
_merge | Merge Indicator Description |
|---|---|
both | The row is present in both left and right DataFrames after the merge. |
left_only | The row is present only in the left DataFrame after the merge. |
right_only | The row is present only in the right DataFrame after the merge. |
Provide Feedback For This Article
We take your feedback seriously and use it to improve our content. Thank you for helping us serve you better!
😊 Thanks for your time, your feedback has been registered!
Comments & Discussion
Facing issues? Have questions? Post them here! We're happy to help!