A Data Frame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type (numeric, string, Boolean, etc.). The Data Frame has both a row and column index; the following operations can be performed on Data Frame
1) Slice Data Frame
2) Append a Column to Data Frame
3) Select a Column of a Data Frame
4) Subset a Data Frame
5) Finding unique elements
6) Sorting a Data frame
7) Merge Data Frames in PYTHON
The following programs guide you how to do above task
1) Slice Data Frame
Slice operations include selecting rows or columns based on labels or index numbers.
Pandas Data Frame syntax includes “loc” and “iloc” functions, eg., data_frame.loc[ ] and data_frame.iloc[ ]. Both functions are used to access rows and/or columns, where “loc” is for access by labels and “iloc” is for access by position, i.e. numerical indices.
The main distinction between the two methods is:
- loc gets rows (and/or columns) with particular labels.
- iloc gets rows (and/or columns) at integer locations.
The following program show how to slice rows and columns
# importing pandas library import pandas as pd # Initializing the nested list with Data set player_list = [['M.S.Dhoni', 36, 75, 5428000], ['A.B.D Villers', 38, 74, 3428000], ['V.Kholi', 31, 70, 8428000], ['S.Smith', 34, 80, 4428000], ['C.Gayle', 40, 100, 4528000], ['J.Root', 33, 72, 7028000], ['K.Peterson', 42, 85, 2528000]] # creating a pandas dataframe df = pd.DataFrame(player_list, columns=['Name', 'Age', 'Weight', 'Salary']) print("Data Frame before slicing") print("----------------------------------------") print(df) print("----------------------------------------") print() print() print("1.Slicing rows in data frame") print("----------------------------------------") df1 = df.iloc[0:4] df11=df.loc[0:4] print("data frame after slicing") print("----------------------------------------") print(df1) print("----------------------------------------") print("slicing with loc") print("----------------------------------------") print(df11) print("----------------------------------------") print() print() print("2.Slicing columns in data frame") df2 = df.iloc[:,0:2] print("----------------------------------------") print("data frame after slicing") print("----------------------------------------") print(df2)
The following is the output
Data Frame before slicing ---------------------------------------- Name Age Weight Salary 0 M.S.Dhoni 36 75 5428000 1 A.B.D Villers 38 74 3428000 2 V.Kholi 31 70 8428000 3 S.Smith 34 80 4428000 4 C.Gayle 40 100 4528000 5 J.Root 33 72 7028000 6 K.Peterson 42 85 2528000 ---------------------------------------- 1.Slicing rows in data frame ---------------------------------------- data frame after slicing ---------------------------------------- Name Age Weight Salary 0 M.S.Dhoni 36 75 5428000 1 A.B.D Villers 38 74 3428000 2 V.Kholi 31 70 8428000 3 S.Smith 34 80 4428000 ---------------------------------------- slicing with loc ---------------------------------------- Name Age Weight Salary 0 M.S.Dhoni 36 75 5428000 1 A.B.D Villers 38 74 3428000 2 V.Kholi 31 70 8428000 3 S.Smith 34 80 4428000 4 C.Gayle 40 100 4528000 ---------------------------------------- 2.Slicing columns in data frame ---------------------------------------- data frame after slicing ---------------------------------------- Name Age 0 M.S.Dhoni 36 1 A.B.D Villers 38 2 V.Kholi 31 3 S.Smith 34 4 C.Gayle 40 5 J.Root 33 6 K.Peterson 42
2) Append a Column to Data Frame
The Data Frame is stored in the form of table. It is to add a column to data frame by using
data-frame[column-name]='list name'
# Import pandas package import pandas as pd # Define a set containing Students data data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Height': [5.1, 6.2, 5.1, 5.2], 'Qualification': ['Msc', 'MA', 'Msc', 'Msc']} # Convert the set into DataFrame df = pd.DataFrame(data) print("----------------------------------------") print("Data Frame before adding column") print("----------------------------------------") print(df) print("----------------------------------------") print() print() # Declare a list that is to be converted into a column address = ['Delhi', 'Bangalore', 'Chennai', 'Patna'] # Using 'Address' as the column name and equating it to the list df['Address'] = address # Observe the result print("----------------------------------------") print("Data Frame after adding column") print("----------------------------------------") print(df)
The following is the output
---------------------------------------- Data Frame before adding column ---------------------------------------- Name Height Qualification 0 Jai 5.1 Msc 1 Princi 6.2 MA 2 Gaurav 5.1 Msc 3 Anuj 5.2 Msc ---------------------------------------- ---------------------------------------- Data Frame after adding column ---------------------------------------- Name Height Qualification Address 0 Jai 5.1 Msc Delhi 1 Princi 6.2 MA Bangalore 2 Gaurav 5.1 Msc Chennai 3 Anuj 5.2 Msc Patna
3) Select a Column of a Data Frame
The column are selected by using label name / column names.
# Import pandas package import pandas as pd # Define a set containing employee data data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'], 'Qualification':['Msc', 'MA', 'MCA', 'Phd']} # Convert the set into DataFrame print("----------------------------------------") print(" Original Data Frame") print("----------------------------------------") df = pd.DataFrame(data) print(df) print() print() # select two columns print("----------------------------------------") print("Selecting two columns from Data Frame") print("----------------------------------------") df2=df[['Name', 'Qualification']] print(df2) print("----------------------------------------") print("Selecting all rows and second to fourth column from Data Frame") print("----------------------------------------") # select all rows # and second to fourth column df3=df[df.columns[1:4]] print(df3)
The following is the output
---------------------------------------- Original Data Frame ---------------------------------------- Name Age Address Qualification 0 Jai 27 Delhi Msc 1 Princi 24 Kanpur MA 2 Gaurav 22 Allahabad MCA 3 Anuj 32 Kannauj Phd ---------------------------------------- Selecting two columns from Data Frame ---------------------------------------- Name Qualification 0 Jai Msc 1 Princi MA 2 Gaurav MCA 3 Anuj Phd ---------------------------------------- Selecting all rows and second to fourth column from Data Frame ---------------------------------------- Age Address Qualification 0 27 Delhi Msc 1 24 Kanpur MA 2 22 Allahabad MCA 3 32 Kannauj Phd
0 comments :
Post a Comment
Note: only a member of this blog may post a comment.