Example-1 (Comparison of three different classifiers)¶
A comparison of a 3 classifiers in scikit-learn on iris dataset.
The iris dataset is a classic and very easy multi-class classification dataset.
Environment check¶
Checking that the notebook is running on Google Colab or not.
import sys
try:
import google.colab
!{sys.executable} -m pip -q -q install pycm
except:
pass
Install scikit-learn¶
import os
!{sys.executable} -m pip -q -q install scikit-learn
if "Example1_files" not in os.listdir():
os.mkdir("Example1_files")
Load dataset¶
from sklearn import datasets
from sklearn.model_selection import train_test_split
from pycm import ConfusionMatrix
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
Classifier 1 (C-Support vector)¶
from sklearn import svm
classifier_1 = svm.SVC(kernel='linear', C=0.01)
y_pred_1 = classifier_1.fit(X_train, y_train).predict(X_test)
cm1=ConfusionMatrix(y_test, y_pred_1)
cm1.print_matrix()
cm1.print_normalized_matrix()
cm1.Kappa
Out[8]:
cm1.Overall_ACC
Out[9]:
cm1.SOA1 # Landis and Koch benchmark
Out[10]:
cm1.SOA2 # Fleiss’ benchmark
Out[11]:
cm1.SOA3 # Altman’s benchmark
Out[12]:
cm1.SOA4 # Cicchetti’s benchmark
Out[13]:
cm1.save_html(os.path.join("Example1_files", "cm1"))
Out[14]:
Classifier 2 (Decision tree)¶
from sklearn.tree import DecisionTreeClassifier
classifier_2 = DecisionTreeClassifier(max_depth=5)
y_pred_2 = classifier_2.fit(X_train, y_train).predict(X_test)
cm2=ConfusionMatrix(y_test, y_pred_2)
cm2.print_matrix()
cm2.print_normalized_matrix()
cm2.Kappa
Out[19]:
cm2.Overall_ACC
Out[20]:
cm2.SOA1 # Landis and Koch benchmark
Out[21]:
cm2.SOA2 # Fleiss’ benchmark
Out[22]:
cm2.SOA3 # Altman’s benchmark
Out[23]:
cm2.SOA4 # Cicchetti’s benchmark
Out[24]:
cm2.save_html(os.path.join("Example1_files","cm2"))
Out[25]:
Classifier 3 (AdaBoost)¶
from sklearn.ensemble import AdaBoostClassifier
classifier_3 = AdaBoostClassifier()
y_pred_3 = classifier_3.fit(X_train, y_train).predict(X_test)
cm3=ConfusionMatrix(y_test, y_pred_3)
cm3.print_matrix()
cm3.print_normalized_matrix()
cm3.Kappa
Out[30]:
cm3.Overall_ACC
Out[31]:
cm3.SOA1 # Landis and Koch benchmark
Out[32]:
cm3.SOA2 # Fleiss’ benchmark
Out[33]:
cm3.SOA3 # Altman’s benchmark
Out[34]:
cm3.SOA4 # Cicchetti’s benchmark
Out[35]:
cm3.save_html(os.path.join("Example1_files", "cm3"))
Out[36]:
How to compare classifiers?¶
from pycm import Compare
cp = Compare({"C-Support vector": cm1, "Decision tree": cm2, "AdaBoost": cm3})
print(cp)
cp.save_report(os.path.join("Example1_files", "cp"))
Out[38]: