Usually in Machine learning we encounter data which have multiple labels in one or multiple columns. These labels can be characters or numeric form. These kind of data cannot be fed in the raw format to a Machine Learning model. To make the data understandable for the model, it is often labeled using Label encoding. Label Encoding is a technique of converting the labels into numeric form so that it could be ingested to a machine learning model. It is an important step in data preprocessing for supervised learning techniques. In this method, we generally replace each value in a categorical column with numbers from 0 to N-1. LabelEncoder is a utility class to help normalize labels such that they contain only values between 0 and n_classes-1.
The following example demonstrates you how to encode labels. Here i am using iris.csv file for example purpose. you can download this file here
sklearn.preprocessing.LabelEncode is used for performing label encoding. The detailed description can be found here on the official website (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)
Setp-1: First we find the unique labels in the column variety as follows
import numpy as np import pandas as pd # Import dataset required data set df = pd.read_csv('iris.csv') df['variety'].unique()
Output
array(['Setosa', 'Versicolor', 'Virginica'], dtype=object)
Setps-2: Now using preprocessing.LabelEncoder() we encode the above unique data set as follow
# Import label encoder from sklearn import preprocessing # label_encoder object knows how to understand word labels. label_encoder = preprocessing.LabelEncoder() # Encode labels in column 'species'. df['variety']= label_encoder.fit_transform(df['variety']) df['variety'].unique()
Output
array([0, 1, 2])
As you can observe in the above output, Setosa is labeled as 0, Versicolor is labeled as 1, and Virginica is labeled as 2
0 comments :
Post a Comment
Note: only a member of this blog may post a comment.