Tuesday, 4 January 2022

How to Merge DataFrames in Pandas

 Python provides three most important techniques for combining data in Pandas:

  1. merge() for combining data on common columns or indices
  2. join() for combining data on a key column or an index
  3. concat() for combining DataFrames across rows or columns

The first technique you’ll learn is merge(). You can use merge() any time you want to do database-like join operations. It’s the most flexible of the three operations you’ll learn.

When you want to combine data objects based on one or more keys in a similar way to a relational database, merge() is the tool you need. More specifically, merge() is most useful when you want to combine rows that share data.

You can achieve both many-to-one and many-to-many joins with merge(). In a many-to-one join, one of your datasets will have many rows in the merge column that repeat the same values (such as 1, 1, 3, 5, 5), while the merge column in the other dataset will not have repeat values (such as 1, 3, 5).

When you use merge(), you’ll provide two required arguments:

  1.  The left DataFrame
  2.  The right DataFrame

After that, you can provide a number of optional arguments to define how your datasets are merged:

  •  how: This defines what kind of merge to make. It defaults to 'inner', but other possible options include 'outer', 'left', and 'right'.
  •  on: Use this to tell merge() which columns or indices (also called key columns or key indices) you want to join on. This is optional. If it isn’t specified, and left_index and right_index (covered below) are False, then columns from the two DataFrames that share names will be used as join keys. If you use on, then the column or index you specify must be present in both objects.
  • left_on and right_on: Use either of these to specify a column or index that is present only in the left or right objects that you are merging. Both default to None.
  • left_index and right_index: Set these to True to use the index of the left or right objects to be merged. Both default to False.
  • suffixes: This is a tuple of strings to append to identical column names that are not merge keys. This allows you to keep track of the origins of columns with the same name.

These are some of the most important parameters to pass to merge(). For the full list, see the Pandas documentation.

How to merge()?

Before getting into the details of how to use merge(), you should first understand the various forms of joins:

  1. inner
  2. outer
  3. left
  4. right

 The following example shows how to merge two DataFrames 

import pandas as pd
df1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})
print("....DataFrame1......")
print(df1)
df2 = pd.DataFrame({'key': ['a', 'b', 'd'], 'data2': range(3)})
print("....DataFrame......")
print(df2)
result=pd.merge(df1, df2)
print("....Merged DataFrame......")
print(result)

The out put of the above programe is 

....DataFrame1......
  key  data1
0   b      0
1   b      1
2   a      2
3   c      3
4   a      4
5   a      5
6   b      6
....DataFrame......
  key  data2
0   a      0
1   b      1
2   d      2
....Merged DataFrame......
  key  data1  data2
0   b      0      1
1   b      1      1
2   b      6      1
3   a      2      0
4   a      4      0
5   a      5      0

Also try the following two program, for your better understanding 

import pandas as pd
df1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})
print("....DataFrame1......")
print(df1)
df2 = pd.DataFrame({'key': ['a', 'b', 'd'], 'data2': range(3)})
print("....DataFrame......")
print(df2)
result=pd.merge(df1, df2, on='key')
print("....Merged DataFrame......")
print(result)

...................................................................................................

import pandas as pd
df3 = pd.DataFrame({'lkey': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],'data1': range(7)})
df4 = pd.DataFrame({'rkey': ['a', 'b', 'd'], 'data2': range(3)})
result=pd.merge(df1, df2, how='outer')
print(result)

0 comments :

Post a Comment

Note: only a member of this blog may post a comment.

Machine Learning

More

Advertisement

Java Tutorial

More

UGC NET CS TUTORIAL

MFCS
COA
PL-CG
DBMS
OPERATING SYSTEM
SOFTWARE ENG
DSA
TOC-CD
ARTIFICIAL INT

C Programming

More

Python Tutorial

More

Data Structures

More

computer Organization

More
Top