縦の結合(行を増やす
df = pd.concat([df_1, df_2, df_3])
とすれば結合できます。ただしカラムが一致していないとエラーになります。
データフレームの数はいくつでも対応していますが、現実的には、For文などでループさせていくかなと思います。
df_1 = pd.DataFrame([["2020/05/01 12:34:56",1,2],["2020/05/01 16:34:56",3,4]], columns = ["Date Time", "aaa", "bbb"]) df_1.index = pd.DatetimeIndex(pd.to_datetime(df_1['Date Time']), name='Date Time') del df_1['Date Time'] df_2 = pd.DataFrame([["2020/05/01 12:34:56",10,20],["2020/05/01 16:34:56",30,40]], columns = ["Date Time", "aaa", "bbb"]) df_2.index = pd.DatetimeIndex(pd.to_datetime(df_2['Date Time']), name='Date Time') del df_2['Date Time'] df_3 = pd.DataFrame([["2020/05/01 12:34:56",100,200],["2020/05/01 16:34:56",300,400]], columns = ["Date Time", "aaa", "bbb"]) df_3.index = pd.DatetimeIndex(pd.to_datetime(df_3['Date Time']), name='Date Time') del df_3['Date Time'] print(df_1) print(df_2) print(df_3) # aaa bbb # Date Time # 2020-05-01 12:34:56 1 2 # 2020-05-01 16:34:56 3 4 # aaa bbb # Date Time # 2020-05-01 12:34:56 10 20 # 2020-05-01 16:34:56 30 40 # aaa bbb # Date Time # 2020-05-01 12:34:56 100 200 # 2020-05-01 16:34:56 300 400 df = pd.concat([df_1, df_2, df_3]) print(df) # aaa bbb # Date Time # 2020-05-01 12:34:56 1 2 # 2020-05-01 16:34:56 3 4 # 2020-05-01 12:34:56 10 20 # 2020-05-01 16:34:56 30 40 # 2020-05-01 12:34:56 100 200 # 2020-05-01 16:34:56 300 400
横の結合(列を増やす
df = pd.concat([df_1, df_2, df_3], axis=1)
とaxis=1
を指定すれば結合できます。
df_1 = pd.DataFrame([["2020/05/01 12:34:56",1,2],["2020/05/01 16:34:56",3,4]], columns = ["Date Time", "aaa", "bbb"]) df_1.index = pd.DatetimeIndex(pd.to_datetime(df_1['Date Time']), name='Date Time') del df_1['Date Time'] df_2 = pd.DataFrame([["2020/05/01 12:34:56",10,20],["2020/05/01 16:34:56",30,40]], columns = ["Date Time", "aaa", "bbb"]) df_2.index = pd.DatetimeIndex(pd.to_datetime(df_2['Date Time']), name='Date Time') del df_2['Date Time'] df_3 = pd.DataFrame([["2020/05/01 12:34:56",100,200],["2020/05/01 16:34:56",300,400]], columns = ["Date Time", "aaa", "bbb"]) df_3.index = pd.DatetimeIndex(pd.to_datetime(df_3['Date Time']), name='Date Time') del df_3['Date Time'] print(df_1) print(df_2) print(df_3) # aaa bbb # Date Time # 2020-05-01 12:34:56 1 2 # 2020-05-01 16:34:56 3 4 # aaa bbb # Date Time # 2020-05-01 12:34:56 10 20 # 2020-05-01 16:34:56 30 40 # aaa bbb # Date Time # 2020-05-01 12:34:56 100 200 # 2020-05-01 16:34:56 300 400 df = pd.concat([df_1, df_2, df_3], axis=1) print(df) # aaa bbb aaa bbb aaa bbb # Date Time # 2020-05-01 12:34:56 1 2 10 20 100 200 # 2020-05-01 16:34:56 3 4 30 40 300 400
全てのインデックスで結合
上記の日時型のインデックスが一致していない場合は、以下のようになります。
(df_3の一行目の日時を13時にしています)
df_1 = pd.DataFrame([["2020/05/01 12:34:56",1,2],["2020/05/01 16:34:56",3,4]], columns = ["Date Time", "aaa", "bbb"]) df_1.index = pd.DatetimeIndex(pd.to_datetime(df_1['Date Time']), name='Date Time') del df_1['Date Time'] df_2 = pd.DataFrame([["2020/05/01 12:34:56",10,20],["2020/05/01 16:34:56",30,40]], columns = ["Date Time", "aaa", "bbb"]) df_2.index = pd.DatetimeIndex(pd.to_datetime(df_2['Date Time']), name='Date Time') del df_2['Date Time'] df_3 = pd.DataFrame([["2020/05/01 13:34:56",100,200],["2020/05/01 16:34:56",300,400]], columns = ["Date Time", "aaa", "bbb"]) df_3.index = pd.DatetimeIndex(pd.to_datetime(df_3['Date Time']), name='Date Time') del df_3['Date Time'] print(df_1) print(df_2) print(df_3) # aaa bbb # Date Time # 2020-05-01 12:34:56 1 2 # 2020-05-01 16:34:56 3 4 # aaa bbb # Date Time # 2020-05-01 12:34:56 10 20 # 2020-05-01 16:34:56 30 40 # aaa bbb # Date Time # 2020-05-01 13:34:56 100 200 # 2020-05-01 16:34:56 300 400 df = pd.concat([df_1, df_2, df_3], axis=1) print(df) # aaa bbb aaa bbb aaa bbb # Date Time # 2020-05-01 12:34:56 1.0 2.0 10.0 20.0 NaN NaN # 2020-05-01 13:34:56 NaN NaN NaN NaN 100.0 200.0 # 2020-05-01 16:34:56 3.0 4.0 30.0 40.0 300.0 400.0
インデックスが一致のみ
上記のようにはせず、インデックスが一致しているものだけで結合する場合は、pd.concat([df_1, df_2, df_3], axis=1, join='inner')
とjoin='inner'
と指定します。
df_1 = pd.DataFrame([["2020/05/01 12:34:56",1,2],["2020/05/01 16:34:56",3,4]], columns = ["Date Time", "aaa", "bbb"]) df_1.index = pd.DatetimeIndex(pd.to_datetime(df_1['Date Time']), name='Date Time') del df_1['Date Time'] df_2 = pd.DataFrame([["2020/05/01 12:34:56",10,20],["2020/05/01 16:34:56",30,40]], columns = ["Date Time", "aaa", "bbb"]) df_2.index = pd.DatetimeIndex(pd.to_datetime(df_2['Date Time']), name='Date Time') del df_2['Date Time'] df_3 = pd.DataFrame([["2020/05/01 13:34:56",100,200],["2020/05/01 16:34:56",300,400]], columns = ["Date Time", "aaa", "bbb"]) df_3.index = pd.DatetimeIndex(pd.to_datetime(df_3['Date Time']), name='Date Time') del df_3['Date Time'] print(df_1) print(df_2) print(df_3) # aaa bbb # Date Time # 2020-05-01 12:34:56 1 2 # 2020-05-01 16:34:56 3 4 # aaa bbb # Date Time # 2020-05-01 12:34:56 10 20 # 2020-05-01 16:34:56 30 40 # aaa bbb # Date Time # 2020-05-01 13:34:56 100 200 # 2020-05-01 16:34:56 300 400 df = pd.concat([df_1, df_2, df_3], axis=1, join='inner') print(df) # aaa bbb aaa bbb aaa bbb # Date Time # 2020-05-01 16:34:56 3 4 30 40 300 400
インデックスがない場合(日時のカラムをキーとする
pd.merge(df_1, df_2, how='outer')
を使えば、以下のように自動的に日時型をキーとしてマージしてくれます。ただし、日時の順番がおかしくなるので、ソートをする必要があります。
全ての日時
df_1 = pd.DataFrame([["2020/05/01 12:34:56",1,2],["2020/05/01 16:34:56",3,4]], columns = ["Date Time", "aaa", "bbb"]) df_2 = pd.DataFrame([["2020/05/01 12:34:56",10,20],["2020/05/01 16:34:56",30,40]], columns = ["Date Time", "ccc", "ddd"]) df_3 = pd.DataFrame([["2020/05/01 13:34:56",100,200],["2020/05/01 16:34:56",300,400]], columns = ["Date Time", "eee", "fff"]) print(df_1) print(df_2) print(df_3) # Date Time aaa bbb # 0 2020/05/01 12:34:56 1 2 # 1 2020/05/01 16:34:56 3 4 # Date Time ccc ddd # 0 2020/05/01 12:34:56 10 20 # 1 2020/05/01 16:34:56 30 40 # Date Time eee fff # 0 2020/05/01 13:34:56 100 200 # 1 2020/05/01 16:34:56 300 400 df = pd.merge(df_1, df_2, how='outer') df = pd.merge(df, df_3, how='outer') print(df) # Date Time aaa bbb ccc ddd eee fff # 0 2020/05/01 12:34:56 1.0 2.0 10.0 20.0 NaN NaN # 1 2020/05/01 16:34:56 3.0 4.0 30.0 40.0 300.0 400.0 # 2 2020/05/01 13:34:56 NaN NaN NaN NaN 100.0 200.0
日時が一致のみ
df_1 = pd.DataFrame([["2020/05/01 12:34:56",1,2],["2020/05/01 16:34:56",3,4]], columns = ["Date Time", "aaa", "bbb"]) df_2 = pd.DataFrame([["2020/05/01 12:34:56",10,20],["2020/05/01 16:34:56",30,40]], columns = ["Date Time", "ccc", "ddd"]) df_3 = pd.DataFrame([["2020/05/01 13:34:56",100,200],["2020/05/01 16:34:56",300,400]], columns = ["Date Time", "eee", "fff"]) print(df_1) print(df_2) print(df_3) # Date Time aaa bbb # 0 2020/05/01 12:34:56 1 2 # 1 2020/05/01 16:34:56 3 4 # Date Time ccc ddd # 0 2020/05/01 12:34:56 10 20 # 1 2020/05/01 16:34:56 30 40 # Date Time eee fff # 0 2020/05/01 13:34:56 100 200 # 1 2020/05/01 16:34:56 300 400 df = pd.merge(df_1, df_2) df = pd.merge(df, df_3) print(df) # Date Time aaa bbb ccc ddd eee fff # 0 2020/05/01 16:34:56 3 4 30 40 300 400
最後に
結合方法が2つで仕様が異なるので注意しましょう。
pd.concat
:join=’inner’(デフォルトはouter)pd.merge
:how=’inner’(デフォルトはinner)