Wednesday, 12 February 2025

Comparing Different Clustering Algorithms like K-means, DBSCAN, GMM, Hierarchical Clustering

Let's implement multiple clustering algorithms on the Wholesale Customer dataset and evaluate them using Silhouette Score and Davies-Bouldin Index.

Clustering Methods to Implement:

  1. k-Means Clustering (Partition-based)
  2. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) (Density-based)
  3. GMM (Gaussian Mixture Model) (Probabilistic-based)
  4.  Hierarchical Clustering

Evaluation Metrics:

  • Silhouette Score: Measures how well-separated the clusters are.
  • Davies-Bouldin Index: Measures intra-cluster similarity and inter-cluster differences.

Step-1: Mount the drive

from google.colab import drive
drive.mount('/content/drive')


 Step-2: Read the Wholesale_customers_data.csv dataset 


Step-2:  Preprocessing: Selecting relevant features and standardizing them

 

 Step-3:   Standardize the dataset for better clustering performance

from sklearn.preprocessing import StandardScaler # Import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


Step-4: Store results in a dictionary for evaluation


clustering_results = {}

 Step-5:  Prepare Function to evaluate clustering results

def evaluate_clustering(labels, X_scaled):
    if len(set(labels)) > 1:  # Ensure we have more than 1 cluster
        silhouette = silhouette_score(X_scaled, labels)
        db_index = davies_bouldin_score(X_scaled, labels)
    else:
        silhouette = -1  # Undefined for a single cluster
        db_index = -1  # Undefined for a single cluster
    return silhouette, db_index

 Step-6k-Means Clustering


from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, davies_bouldin_score

kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans_labels = kmeans.fit_predict(X_scaled)
clustering_results['k-Means'] = evaluate_clustering(kmeans_labels, X_scaled)

 Step-7: DBSCAN Clustering


from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=1.5, min_samples=5)
dbscan_labels = dbscan.fit_predict(X_scaled)
clustering_results['DBSCAN'] = evaluate_clustering(dbscan_labels, X_scaled)

 Step-8Gaussian Mixture Model (GMM)


from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=3, random_state=42)
gmm_labels = gmm.fit_predict(X_scaled)
clustering_results['GMM'] = evaluate_clustering(gmm_labels, X_scaled)

 Step-9Hierarchical (Agglomerative) Clustering

from sklearn.cluster import AgglomerativeClustering

hierarchical = AgglomerativeClustering(n_clusters=3, linkage='ward')
hierarchical_labels = hierarchical.fit_predict(X_scaled)

# Evaluate Hierarchical Clustering
clustering_results['Hierarchical'] = evaluate_clustering(hierarchical_labels, X_scaled)

Step-10:  Convert results to DataFrame for easy comparison


clustering_eval_df = pd.DataFrame.from_dict(
    clustering_results, orient='index', columns=['Silhouette Score', 'Davies-Bouldin Index']
)

 Step-11: Comparing Different Clustering Algorithms

import pandas as pd # If pandas is not already imported

print("Clustering Evaluation Metrics (Including Hierarchical):")
display(clustering_eval_df)

 Final output 


0 comments :

Post a Comment

Note: only a member of this blog may post a comment.

NumPy Tutorial

More

Advertisement

Java Tutorial

More

UGC NET CS TUTORIAL

MFCS
COA
PL-CG
DBMS
OPERATING SYSTEM
SOFTWARE ENG
DSA
TOC-CD
ARTIFICIAL INT

C Programming

More

Python Tutorial

More

Data Structures

More

computer Organization

More
Top