Explanation of Implementation Steps
Dataset Loading:
- The Iris dataset is loaded, which contains 150 samples belonging to three different species of iris.
- Features (sepal length, sepal width, petal length, petal width) are stored in
X
, and labels (species) iny
.
Data Splitting:
- The dataset is split into training (70%) and testing (30%) subsets to evaluate the model's performance on unseen data.
Classifier Creation and Training:
- A Logistic Regression classifier is instantiated with
multi_class='multinomial'
to perform true multi-class classification rather than a One-vs-Rest scheme. - The solver
lbfgs
is used, which supports the multinomial loss. - The classifier is trained on the training data.
- A Logistic Regression classifier is instantiated with
Prediction:
- The trained classifier is used to predict the class labels for the test data.
Evaluation:
- The model's accuracy is computed along with a detailed classification report that includes precision, recall, and F1-score for each class.
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
# 1. Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# 2. Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42)
# 3. Create and train the classifier (Logistic Regression)
# 'multinomial' is specified for true multi-class classification,
# and 'lbfgs' is a solver that works well for this setting.
clf = LogisticRegression(max_iter=200, multi_class='multinomial', solver='lbfgs')
clf.fit(X_train, y_train)
# 4. Make predictions on the test set
y_pred = clf.predict(X_test)
# 5. Evaluate the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))
print("Classification Report:")
print(classification_report(y_test, y_pred))
Output:
"One-vs-Rest" and "One-vs-One" approaches.
Implementation Steps Explained. Click Here to watch video
Dataset Loading and Splitting:
- Dataset: We use the Iris dataset, which consists of 150 samples from three classes.
- Splitting: The data is divided into training and testing sets (80/20 split) to evaluate model performance.
Defining the Base Classifier:
- We choose Logistic Regression as our base classifier. Although logistic regression natively supports multi-class classification (using a multinomial approach), wrapping it with multi-class strategies like OvR or OvO lets us compare these approaches explicitly.
- The
max_iter
parameter is set to ensure the model converges.
One-vs-Rest (OvR) Classification:
- Approach: In OvR, one classifier is trained per class, with that class as the positive label and all others as negative.
- Implementation: The
OneVsRestClassifier
wrapper is used to train the base classifier. - Evaluation: Predictions on the test set are compared with true labels using accuracy and a detailed classification report.
One-vs-One (OvO) Classification:
- Approach: OvO trains a binary classifier for every pair of classes. For three classes, this results in three classifiers.
- Implementation: The
OneVsOneClassifier
wrapper is used to build the models. - Evaluation: The performance is again evaluated on the test set using accuracy and a classification report.
Comparison:
- Both approaches are evaluated in terms of accuracy and detailed classification metrics (precision, recall, f1-score).
- OvR is generally more computationally efficient when dealing with many classes, while OvO can sometimes offer better decision boundaries by focusing on pairwise distinctions.
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
# 1. Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# 2. Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
# 3. Define the base classifier: Logistic Regression
# Increase max_iter if needed to ensure convergence.
base_clf = LogisticRegression(max_iter=200)
# --- One-vs-Rest Approach ---
ovr_clf = OneVsRestClassifier(base_clf)
ovr_clf.fit(X_train, y_train)
y_pred_ovr = ovr_clf.predict(X_test)
# Evaluate One-vs-Rest classifier performance
ovr_accuracy = accuracy_score(y_test, y_pred_ovr)
print("One-vs-Rest Classification Report:")
print(classification_report(y_test, y_pred_ovr))
print("One-vs-Rest Accuracy: {:.2f}%\n".format(ovr_accuracy * 100))
# --- One-vs-One Approach ---
ovo_clf = OneVsOneClassifier(base_clf)
ovo_clf.fit(X_train, y_train)
y_pred_ovo = ovo_clf.predict(X_test)
# Evaluate One-vs-One classifier performance
ovo_accuracy = accuracy_score(y_test, y_pred_ovo)
print("One-vs-One Classification Report:")
print(classification_report(y_test, y_pred_ovo))
print("One-vs-One Accuracy: {:.2f}%".format(ovo_accuracy * 100))
0 comments :
Post a Comment
Note: only a member of this blog may post a comment.