横向拼接与纵向堆叠,主键关联,处理重复与缺失
使用多个数据集进行合并练习:retail_orders, user_logs, ab_test。
首先读取需要合并的数据集。
使用 concat 堆叠相似结构的数据。
使用 concat 按列拼接数据。
按 common key 合并两个数据表。
inner, left, right, outer 的区别。
import pandas as pd orders = pd.read_csv('retail_orders.csv') users = pd.read_csv('user_logs.csv')
df1 = orders.head(5) df2 = orders.tail(5) combined = pd.concat([df1, df2], axis=0, ignore_index=True) print(combined)
df_left = orders[['order_id', 'product']].head(5) df_right = orders[['quantity', 'price']].head(5) combined = pd.concat([df_left, df_right], axis=1) print(combined)
df_a = pd.DataFrame({'id': [1,2,3], 'name': ['A','B','C']}) df_b = pd.DataFrame({'id': [2,3,4], 'score': [80,90,75]}) merged = pd.merge(df_a, df_b, on='id', how='inner') print(merged)
df_a = pd.DataFrame({'id': [1,2,3], 'name': ['A','B','C']}) df_b = pd.DataFrame({'id': [2,3,4], 'score': [80,90,75]}) merged = pd.merge(df_a, df_b, on='id', how='left') print(merged)
df_a = pd.DataFrame({'id': [1,2,3], 'name': ['A','B','C']}) df_b = pd.DataFrame({'id': [2,3,4], 'score': [80,90,75]}) merged = pd.merge(df_a, df_b, on='id', how='outer') print(merged)
df_a = pd.DataFrame({'user_id': [1,2,3], 'name': ['A','B','C']}) df_b = pd.DataFrame({'id': [2,3,4], 'score': [80,90,75]}) merged = pd.merge(df_a, df_b, left_on='user_id', right_on='id', how='inner') print(merged)
df1 = orders.head(5) df2 = orders.head(5) combined = pd.concat([df1, df2], axis=0) cleaned = combined.drop_duplicates() print(f"合并前: {len(combined)}, 去重后: {len(cleaned)}") print(cleaned)