Please cite us if you use the software

PyCM Document¶

Version : 4.4 ¶

Overview¶

PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and accurate evaluation of a large variety of classifiers.

Fig1. ConfusionMatrix Block Diagram

Installation¶

⚠️ PyCM 4.3 is the last version to support Python 3.6

⚠️ PyCM 3.9 is the last version to support Python 3.5

⚠️ PyCM 2.4 is the last version to support Python 2.7 & Python 3.4

⚠️ Plotting capability requires Matplotlib (>= 3.0.0) or Seaborn (>= 0.9.1)

PyPI¶

Check Python Packaging User Guide
Run pip install pycm==4.4

Source code¶

Download Version 4.4 or Latest Source
Run pip install .

Conda¶

Check Conda Managing Package
conda install -c sepandhaghighi pycm

MATLAB¶

Download and install MATLAB (>=8.5, 64/32 bit)
Download and install Python3.x (>=3.7, 64/32 bit)
- Select Add to PATH option
- Select Install pip option
Run pip install pycm or pip3 install pycm

Configure Python interpreter

>> pyversion PYTHON_EXECUTABLE_FULL_PATH

Visit MATLAB Examples

Usage¶

Environment check¶

Checking that the notebook is running on Google Colab or not.

import sys
try:
  import google.colab
  !{sys.executable} -m pip -q -q install pycm
except:
  pass

From vector¶

from pycm import *

y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]
y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]

cm = ConfusionMatrix(y_actu, y_pred,digit=5)

Notice : digit (the number of digits to the right of the decimal point in a number) is new in version 0.6 (default value : 5)
Only for print and save

cm

pycm.ConfusionMatrix(classes: [0, 1, 2])

cm.actual_vector

[2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]

cm.predict_vector

[0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]

cm.classes

[0, 1, 2]

cm.class_stat

{'ACC': {0: 0.8333333333333334, 1: 0.75, 2: 0.5833333333333334},
 'AGF': {0: 0.9135962935560564, 1: 0.5399492471560389, 2: 0.5515973485146916},
 'AGM': {0: 0.837285964012303, 1: 0.6919986974962765, 2: 0.6071224016819726},
 'AM': {0: 2, 1: -1, 2: -1},
 'AUC': {0: 0.8888888888888888, 1: 0.611111111111111, 2: 0.5833333333333333},
 'AUCI': {0: 'Very Good', 1: 'Fair', 2: 'Poor'},
 'AUPR': {0: 0.8, 1: 0.41666666666666663, 2: 0.55},
 'BB': {0: 0.6, 1: 0.3333333333333333, 2: 0.5},
 'BCD': {0: 0.08333333333333333,
  1: 0.041666666666666664,
  2: 0.041666666666666664},
 'BM': {0: 0.7777777777777777, 1: 0.2222222222222221, 2: 0.16666666666666652},
 'CEN': {0: 0.25, 1: 0.49657842846620864, 2: 0.6044162769630221},
 'DOR': {0: 'None', 1: 3.999999999999998, 2: 1.9999999999999998},
 'DP': {0: 'None', 1: 0.331933069996499, 2: 0.16596653499824957},
 'DPI': {0: 'None', 1: 'Poor', 2: 'Poor'},
 'ERR': {0: 0.16666666666666663, 1: 0.25, 2: 0.41666666666666663},
 'F0.5': {0: 0.6521739130434783,
  1: 0.45454545454545453,
  2: 0.5769230769230769},
 'F1': {0: 0.75, 1: 0.4, 2: 0.5454545454545454},
 'F2': {0: 0.8823529411764706, 1: 0.35714285714285715, 2: 0.5172413793103449},
 'FDR': {0: 0.4, 1: 0.5, 2: 0.4},
 'FN': {0: 0, 1: 2, 2: 3},
 'FNR': {0: 0.0, 1: 0.6666666666666667, 2: 0.5},
 'FOR': {0: 0.0, 1: 0.19999999999999996, 2: 0.4285714285714286},
 'FP': {0: 2, 1: 1, 2: 2},
 'FPR': {0: 0.2222222222222222,
  1: 0.11111111111111116,
  2: 0.33333333333333337},
 'G': {0: 0.7745966692414834, 1: 0.408248290463863, 2: 0.5477225575051661},
 'GI': {0: 0.7777777777777777, 1: 0.2222222222222221, 2: 0.16666666666666652},
 'GM': {0: 0.8819171036881969, 1: 0.5443310539518174, 2: 0.5773502691896257},
 'HD': {0: 2, 1: 3, 2: 5},
 'IBA': {0: 0.9506172839506174, 1: 0.1316872427983539, 2: 0.2777777777777778},
 'ICSI': {0: 0.6000000000000001,
  1: -0.16666666666666674,
  2: 0.10000000000000009},
 'IS': {0: 1.263034405833794, 1: 1.0, 2: 0.2630344058337938},
 'J': {0: 0.6, 1: 0.25, 2: 0.375},
 'LS': {0: 2.4, 1: 2.0, 2: 1.2},
 'MCC': {0: 0.6831300510639732, 1: 0.25819888974716115, 2: 0.1690308509457033},
 'MCCI': {0: 'Moderate', 1: 'Negligible', 2: 'Negligible'},
 'MCEN': {0: 0.2643856189774724, 1: 0.5, 2: 0.6875},
 'MK': {0: 0.6000000000000001, 1: 0.30000000000000004, 2: 0.17142857142857126},
 'N': {0: 9, 1: 9, 2: 6},
 'NLR': {0: 0.0, 1: 0.7500000000000001, 2: 0.75},
 'NLRI': {0: 'Good', 1: 'Negligible', 2: 'Negligible'},
 'NPV': {0: 1.0, 1: 0.8, 2: 0.5714285714285714},
 'OC': {0: 1.0, 1: 0.5, 2: 0.6},
 'OOC': {0: 0.7745966692414834, 1: 0.4082482904638631, 2: 0.5477225575051661},
 'OP': {0: 0.7083333333333334, 1: 0.2954545454545454, 2: 0.4404761904761905},
 'P': {0: 3, 1: 3, 2: 6},
 'PLR': {0: 4.5, 1: 2.9999999999999987, 2: 1.4999999999999998},
 'PLRI': {0: 'Poor', 1: 'Poor', 2: 'Poor'},
 'POP': {0: 12, 1: 12, 2: 12},
 'PPV': {0: 0.6, 1: 0.5, 2: 0.6},
 'PR': {0: 0.25, 1: 0.25, 2: 0.5},
 'PRE': {0: 0.25, 1: 0.25, 2: 0.5},
 'Q': {0: 'None', 1: 0.6, 2: 0.3333333333333333},
 'QI': {0: 'None', 1: 'Moderate', 2: 'Weak'},
 'RACC': {0: 0.10416666666666667,
  1: 0.041666666666666664,
  2: 0.20833333333333334},
 'RACCU': {0: 0.1111111111111111,
  1: 0.04340277777777778,
  2: 0.21006944444444442},
 'TN': {0: 7, 1: 8, 2: 4},
 'TNR': {0: 0.7777777777777778, 1: 0.8888888888888888, 2: 0.6666666666666666},
 'TON': {0: 7, 1: 10, 2: 7},
 'TOP': {0: 5, 1: 2, 2: 5},
 'TOPR': {0: 0.4166666666666667,
  1: 0.16666666666666666,
  2: 0.4166666666666667},
 'TP': {0: 3, 1: 1, 2: 3},
 'TPR': {0: 1.0, 1: 0.3333333333333333, 2: 0.5},
 'Y': {0: 0.7777777777777777, 1: 0.2222222222222221, 2: 0.16666666666666652},
 'dInd': {0: 0.2222222222222222, 1: 0.6758625033664689, 2: 0.6009252125773316},
 'sInd': {0: 0.8428651597363228, 1: 0.5220930407198541, 2: 0.5750817072006014}}

Notice : cm.statistic_result prev versions (0.2 >)

cm.overall_stat

{'95% CI': (0.30438856248221097, 0.8622781041844558),
 'ACC Macro': 0.7222222222222223,
 'ARI': 0.09206349206349207,
 'AUNP': 0.6666666666666666,
 'AUNU': 0.6944444444444443,
 'Bangdiwala B': 0.37254901960784315,
 'Bennett S': 0.37500000000000006,
 'CBA': 0.4777777777777778,
 'CSI': 0.1777777777777778,
 'Chi-Squared': 6.6,
 'Chi-Squared DF': 4,
 'Conditional Entropy': 0.9591479170272448,
 'Cramer V': 0.5244044240850757,
 'Cross Entropy': 1.5935164295556343,
 'F1 Macro': 0.5651515151515151,
 'F1 Micro': 0.5833333333333334,
 'FNR Macro': 0.38888888888888895,
 'FNR Micro': 0.41666666666666663,
 'FPR Macro': 0.22222222222222232,
 'FPR Micro': 0.20833333333333337,
 'Gwet AC1': 0.3893129770992367,
 'Hamming Loss': 0.41666666666666663,
 'Joint Entropy': 2.4591479170272446,
 'KL Divergence': 0.09351642955563438,
 'Kappa': 0.35483870967741943,
 'Kappa 95% CI': (-0.07707577422109269, 0.7867531935759315),
 'Kappa No Prevalence': 0.16666666666666674,
 'Kappa Standard Error': 0.2203645326012817,
 'Kappa Unbiased': 0.34426229508196726,
 'Krippendorff Alpha': 0.3715846994535519,
 'Lambda A': 0.16666666666666666,
 'Lambda B': 0.42857142857142855,
 'Mutual Information': 0.5242078379544426,
 'NIR': 0.5,
 'NPV Macro': 0.7904761904761904,
 'NPV Micro': 0.7916666666666666,
 'Overall ACC': 0.5833333333333334,
 'Overall CEN': 0.4638112995385119,
 'Overall J': (1.225, 0.4083333333333334),
 'Overall MCC': 0.36666666666666664,
 'Overall MCEN': 0.5189369467580801,
 'Overall RACC': 0.3541666666666667,
 'Overall RACCU': 0.3645833333333333,
 'P-Value': 0.38720703125,
 'PPV Macro': 0.5666666666666668,
 'PPV Micro': 0.5833333333333334,
 'Pearson C': 0.5956833971812705,
 'Phi-Squared': 0.5499999999999999,
 'RCI': 0.3494718919696284,
 'RR': 4.0,
 'Reference Entropy': 1.5,
 'Response Entropy': 1.4833557549816874,
 'SOA1(Landis & Koch)': 'Fair',
 'SOA10(Pearson C)': 'Strong',
 'SOA2(Fleiss)': 'Poor',
 'SOA3(Altman)': 'Fair',
 'SOA4(Cicchetti)': 'Poor',
 'SOA5(Cramer)': 'Relatively Strong',
 'SOA6(Matthews)': 'Weak',
 'SOA7(Lambda A)': 'Very Weak',
 'SOA8(Lambda B)': 'Moderate',
 'SOA9(Krippendorff Alpha)': 'Low',
 'Scott PI': 0.34426229508196726,
 'Standard Error': 0.14231876063832777,
 'TNR Macro': 0.7777777777777777,
 'TNR Micro': 0.7916666666666666,
 'TPR Macro': 0.611111111111111,
 'TPR Micro': 0.5833333333333334,
 'Zero-one Loss': 5}

Notice : new in version 0.3

Notice : _ removed from overall statistics names in version 1.6

cm.table

{0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}}

cm.matrix

{0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}}

cm.normalized_matrix

{0: {0: 1.0, 1: 0.0, 2: 0.0},
 1: {0: 0.0, 1: 0.33333, 2: 0.66667},
 2: {0: 0.33333, 1: 0.16667, 2: 0.5}}

cm.normalized_table

{0: {0: 1.0, 1: 0.0, 2: 0.0},
 1: {0: 0.0, 1: 0.33333, 2: 0.66667},
 2: {0: 0.33333, 1: 0.16667, 2: 0.5}}

Notice : matrix, normalized_matrix & normalized_table added in version 1.5 (changed from print style)

import numpy

y_actu = numpy.array([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2])
y_pred = numpy.array([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2])

cm = ConfusionMatrix(y_actu, y_pred,digit=5)

cm

pycm.ConfusionMatrix(classes: [0, 1, 2])

Notice : numpy.array support in versions > 0.7

cm = ConfusionMatrix(y_actu, y_pred, classes=[1, 0, 2])

cm.print_matrix()

Predict 1       0       2       
Actual
1       1       0       2       

0       0       3       0       

2       1       2       3

cm = ConfusionMatrix(y_actu, y_pred, classes=[0, 1])

cm.print_matrix()

Predict 0       1       
Actual
0       3       0       

1       0       1

cm = ConfusionMatrix(y_actu, y_pred, classes=[1, 0, 4])

C:\Users\Sepkjaer\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pycm-4.4-py3.5.egg\pycm\utils.py:399: RuntimeWarning: Specified classes are not a subset of the classes in the actual and predicted vectors.

cm.print_matrix()

Predict 1       0       4       
Actual
1       1       0       0       

0       0       3       0       

4       0       0       0

Notice : classes added in version 3.2

1 in cm

True

10 in cm

False

cm[1][1]

1

Notice : __getitem__ and __contains__ methods added in version 3.8

Direct CM¶

cm2 = ConfusionMatrix(matrix={0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}}, digit=5)

cm2

pycm.ConfusionMatrix(classes: [0, 1, 2])

cm2.actual_vector

cm2.predict_vector

cm2.classes

[0, 1, 2]

cm2.class_stat

{'ACC': {0: 0.8333333333333334, 1: 0.75, 2: 0.5833333333333334},
 'AGF': {0: 0.9135962935560564, 1: 0.5399492471560389, 2: 0.5515973485146916},
 'AGM': {0: 0.837285964012303, 1: 0.6919986974962765, 2: 0.6071224016819726},
 'AM': {0: 2, 1: -1, 2: -1},
 'AUC': {0: 0.8888888888888888, 1: 0.611111111111111, 2: 0.5833333333333333},
 'AUCI': {0: 'Very Good', 1: 'Fair', 2: 'Poor'},
 'AUPR': {0: 0.8, 1: 0.41666666666666663, 2: 0.55},
 'BB': {0: 0.6, 1: 0.3333333333333333, 2: 0.5},
 'BCD': {0: 0.08333333333333333,
  1: 0.041666666666666664,
  2: 0.041666666666666664},
 'BM': {0: 0.7777777777777777, 1: 0.2222222222222221, 2: 0.16666666666666652},
 'CEN': {0: 0.25, 1: 0.49657842846620864, 2: 0.6044162769630221},
 'DOR': {0: 'None', 1: 3.999999999999998, 2: 1.9999999999999998},
 'DP': {0: 'None', 1: 0.331933069996499, 2: 0.16596653499824957},
 'DPI': {0: 'None', 1: 'Poor', 2: 'Poor'},
 'ERR': {0: 0.16666666666666663, 1: 0.25, 2: 0.41666666666666663},
 'F0.5': {0: 0.6521739130434783,
  1: 0.45454545454545453,
  2: 0.5769230769230769},
 'F1': {0: 0.75, 1: 0.4, 2: 0.5454545454545454},
 'F2': {0: 0.8823529411764706, 1: 0.35714285714285715, 2: 0.5172413793103449},
 'FDR': {0: 0.4, 1: 0.5, 2: 0.4},
 'FN': {0: 0, 1: 2, 2: 3},
 'FNR': {0: 0.0, 1: 0.6666666666666667, 2: 0.5},
 'FOR': {0: 0.0, 1: 0.19999999999999996, 2: 0.4285714285714286},
 'FP': {0: 2, 1: 1, 2: 2},
 'FPR': {0: 0.2222222222222222,
  1: 0.11111111111111116,
  2: 0.33333333333333337},
 'G': {0: 0.7745966692414834, 1: 0.408248290463863, 2: 0.5477225575051661},
 'GI': {0: 0.7777777777777777, 1: 0.2222222222222221, 2: 0.16666666666666652},
 'GM': {0: 0.8819171036881969, 1: 0.5443310539518174, 2: 0.5773502691896257},
 'HD': {0: 2, 1: 3, 2: 5},
 'IBA': {0: 0.9506172839506174, 1: 0.1316872427983539, 2: 0.2777777777777778},
 'ICSI': {0: 0.6000000000000001,
  1: -0.16666666666666674,
  2: 0.10000000000000009},
 'IS': {0: 1.263034405833794, 1: 1.0, 2: 0.2630344058337938},
 'J': {0: 0.6, 1: 0.25, 2: 0.375},
 'LS': {0: 2.4, 1: 2.0, 2: 1.2},
 'MCC': {0: 0.6831300510639732, 1: 0.25819888974716115, 2: 0.1690308509457033},
 'MCCI': {0: 'Moderate', 1: 'Negligible', 2: 'Negligible'},
 'MCEN': {0: 0.2643856189774724, 1: 0.5, 2: 0.6875},
 'MK': {0: 0.6000000000000001, 1: 0.30000000000000004, 2: 0.17142857142857126},
 'N': {0: 9, 1: 9, 2: 6},
 'NLR': {0: 0.0, 1: 0.7500000000000001, 2: 0.75},
 'NLRI': {0: 'Good', 1: 'Negligible', 2: 'Negligible'},
 'NPV': {0: 1.0, 1: 0.8, 2: 0.5714285714285714},
 'OC': {0: 1.0, 1: 0.5, 2: 0.6},
 'OOC': {0: 0.7745966692414834, 1: 0.4082482904638631, 2: 0.5477225575051661},
 'OP': {0: 0.7083333333333334, 1: 0.2954545454545454, 2: 0.4404761904761905},
 'P': {0: 3, 1: 3, 2: 6},
 'PLR': {0: 4.5, 1: 2.9999999999999987, 2: 1.4999999999999998},
 'PLRI': {0: 'Poor', 1: 'Poor', 2: 'Poor'},
 'POP': {0: 12, 1: 12, 2: 12},
 'PPV': {0: 0.6, 1: 0.5, 2: 0.6},
 'PR': {0: 0.25, 1: 0.25, 2: 0.5},
 'PRE': {0: 0.25, 1: 0.25, 2: 0.5},
 'Q': {0: 'None', 1: 0.6, 2: 0.3333333333333333},
 'QI': {0: 'None', 1: 'Moderate', 2: 'Weak'},
 'RACC': {0: 0.10416666666666667,
  1: 0.041666666666666664,
  2: 0.20833333333333334},
 'RACCU': {0: 0.1111111111111111,
  1: 0.04340277777777778,
  2: 0.21006944444444442},
 'TN': {0: 7, 1: 8, 2: 4},
 'TNR': {0: 0.7777777777777778, 1: 0.8888888888888888, 2: 0.6666666666666666},
 'TON': {0: 7, 1: 10, 2: 7},
 'TOP': {0: 5, 1: 2, 2: 5},
 'TOPR': {0: 0.4166666666666667,
  1: 0.16666666666666666,
  2: 0.4166666666666667},
 'TP': {0: 3, 1: 1, 2: 3},
 'TPR': {0: 1.0, 1: 0.3333333333333333, 2: 0.5},
 'Y': {0: 0.7777777777777777, 1: 0.2222222222222221, 2: 0.16666666666666652},
 'dInd': {0: 0.2222222222222222, 1: 0.6758625033664689, 2: 0.6009252125773316},
 'sInd': {0: 0.8428651597363228, 1: 0.5220930407198541, 2: 0.5750817072006014}}

cm2.overall_stat

{'95% CI': (0.30438856248221097, 0.8622781041844558),
 'ACC Macro': 0.7222222222222223,
 'ARI': 0.09206349206349207,
 'AUNP': 0.6666666666666666,
 'AUNU': 0.6944444444444443,
 'Bangdiwala B': 0.37254901960784315,
 'Bennett S': 0.37500000000000006,
 'CBA': 0.4777777777777778,
 'CSI': 0.1777777777777778,
 'Chi-Squared': 6.6,
 'Chi-Squared DF': 4,
 'Conditional Entropy': 0.9591479170272448,
 'Cramer V': 0.5244044240850757,
 'Cross Entropy': 1.5935164295556343,
 'F1 Macro': 0.5651515151515151,
 'F1 Micro': 0.5833333333333334,
 'FNR Macro': 0.38888888888888895,
 'FNR Micro': 0.41666666666666663,
 'FPR Macro': 0.22222222222222232,
 'FPR Micro': 0.20833333333333337,
 'Gwet AC1': 0.3893129770992367,
 'Hamming Loss': 0.41666666666666663,
 'Joint Entropy': 2.4591479170272446,
 'KL Divergence': 0.09351642955563438,
 'Kappa': 0.35483870967741943,
 'Kappa 95% CI': (-0.07707577422109269, 0.7867531935759315),
 'Kappa No Prevalence': 0.16666666666666674,
 'Kappa Standard Error': 0.2203645326012817,
 'Kappa Unbiased': 0.34426229508196726,
 'Krippendorff Alpha': 0.3715846994535519,
 'Lambda A': 0.16666666666666666,
 'Lambda B': 0.42857142857142855,
 'Mutual Information': 0.5242078379544426,
 'NIR': 0.5,
 'NPV Macro': 0.7904761904761904,
 'NPV Micro': 0.7916666666666666,
 'Overall ACC': 0.5833333333333334,
 'Overall CEN': 0.4638112995385119,
 'Overall J': (1.225, 0.4083333333333334),
 'Overall MCC': 0.36666666666666664,
 'Overall MCEN': 0.5189369467580801,
 'Overall RACC': 0.3541666666666667,
 'Overall RACCU': 0.3645833333333333,
 'P-Value': 0.38720703125,
 'PPV Macro': 0.5666666666666668,
 'PPV Micro': 0.5833333333333334,
 'Pearson C': 0.5956833971812705,
 'Phi-Squared': 0.5499999999999999,
 'RCI': 0.3494718919696284,
 'RR': 4.0,
 'Reference Entropy': 1.5,
 'Response Entropy': 1.4833557549816874,
 'SOA1(Landis & Koch)': 'Fair',
 'SOA10(Pearson C)': 'Strong',
 'SOA2(Fleiss)': 'Poor',
 'SOA3(Altman)': 'Fair',
 'SOA4(Cicchetti)': 'Poor',
 'SOA5(Cramer)': 'Relatively Strong',
 'SOA6(Matthews)': 'Weak',
 'SOA7(Lambda A)': 'Very Weak',
 'SOA8(Lambda B)': 'Moderate',
 'SOA9(Krippendorff Alpha)': 'Low',
 'Scott PI': 0.34426229508196726,
 'Standard Error': 0.14231876063832777,
 'TNR Macro': 0.7777777777777777,
 'TNR Micro': 0.7916666666666666,
 'TPR Macro': 0.611111111111111,
 'TPR Micro': 0.5833333333333334,
 'Zero-one Loss': 5}

Notice : new in version 0.8.1
In direct matrix mode actual_vector and predict_vector are empty

array = [[1, 2, 3], [4, 6, 1], [1, 2, 3]]

cm = ConfusionMatrix(matrix=array)

cm

pycm.ConfusionMatrix(classes: [0, 1, 2])

cm.classes

[0, 1, 2]

Notice : confusion matrices input in array format is new in version 3.6

cm.table

{0: {0: 1, 1: 2, 2: 3}, 1: {0: 4, 1: 6, 2: 1}, 2: {0: 1, 1: 2, 2: 3}}

cm.matrix

{0: {0: 1, 1: 2, 2: 3}, 1: {0: 4, 1: 6, 2: 1}, 2: {0: 1, 1: 2, 2: 3}}

cm.normalized_matrix

{0: {0: 0.16667, 1: 0.33333, 2: 0.5},
 1: {0: 0.36364, 1: 0.54545, 2: 0.09091},
 2: {0: 0.16667, 1: 0.33333, 2: 0.5}}

cm.normalized_table

{0: {0: 0.16667, 1: 0.33333, 2: 0.5},
 1: {0: 0.36364, 1: 0.54545, 2: 0.09091},
 2: {0: 0.16667, 1: 0.33333, 2: 0.5}}

cm.print_matrix()

Predict 0       1       2       
Actual
0       1       2       3       

1       4       6       1       

2       1       2       3

import numpy

array = numpy.array([[1, 2, 3], [4, 6, 1], [1, 2, 3]])

cm = ConfusionMatrix(matrix=array)

cm

pycm.ConfusionMatrix(classes: [0, 1, 2])

cm = ConfusionMatrix(matrix=array, classes=["L1", "L2", "L3"])

cm

pycm.ConfusionMatrix(classes: ['L1', 'L2', 'L3'])

Iterating and casting¶

From version 3.5, ConfusionMatrix is an iterator object.

for row, col in cm:
    print(row, col)

L1 {'L1': 1, 'L2': 2, 'L3': 3}
L2 {'L1': 4, 'L2': 6, 'L3': 1}
L3 {'L1': 1, 'L2': 2, 'L3': 3}

cm_iter = iter(cm)
next(cm_iter)

('L1', {'L1': 1, 'L2': 2, 'L3': 3})

cm_dict = dict(cm)
cm_dict

{'L1': {'L1': 1, 'L2': 2, 'L3': 3},
 'L2': {'L1': 4, 'L2': 6, 'L3': 1},
 'L3': {'L1': 1, 'L2': 2, 'L3': 3}}

cm_list = list(cm)
cm_list

[('L1', {'L1': 1, 'L2': 2, 'L3': 3}),
 ('L2', {'L1': 4, 'L2': 6, 'L3': 1}),
 ('L3', {'L1': 1, 'L2': 2, 'L3': 3})]

Notice : new in version 3.5

Activation threshold¶

threshold is added in version 0.9 for real value prediction.

For more information visit Example 3

Notice : new in version 0.9

Load from file¶

file is added in version 0.9.5 in order to load saved confusion matrix with .obj format generated by save_obj method.

For more information visit Example 4

Notice : new in version 0.9.5

Sample weights¶

sample_weight is added in version 1.2

For more information visit Example 5

Notice : new in version 1.2

Transpose¶

transpose is added in version 1.2 in order to transpose input matrix (only in Direct CM mode)

cm = ConfusionMatrix(matrix={0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}}, digit=5, transpose=True)

cm.print_matrix()

Predict 0       1       2       
Actual
0       3       0       2       

1       0       1       1       

2       0       2       3

Notice : new in version 1.2

Metrics off¶

metrics_off is added in version 3.9 in order to bypass metrics calculation.

cm3 = ConfusionMatrix(y_actu, y_pred, metrics_off=True)

cm3.class_stat

{'ACC': {0: 'None', 1: 'None', 2: 'None'},
 'AGF': {0: 'None', 1: 'None', 2: 'None'},
 'AGM': {0: 'None', 1: 'None', 2: 'None'},
 'AM': {0: 'None', 1: 'None', 2: 'None'},
 'AUC': {0: 'None', 1: 'None', 2: 'None'},
 'AUCI': {0: 'None', 1: 'None', 2: 'None'},
 'AUPR': {0: 'None', 1: 'None', 2: 'None'},
 'BB': {0: 'None', 1: 'None', 2: 'None'},
 'BCD': {0: 'None', 1: 'None', 2: 'None'},
 'BM': {0: 'None', 1: 'None', 2: 'None'},
 'CEN': {0: 'None', 1: 'None', 2: 'None'},
 'DOR': {0: 'None', 1: 'None', 2: 'None'},
 'DP': {0: 'None', 1: 'None', 2: 'None'},
 'DPI': {0: 'None', 1: 'None', 2: 'None'},
 'ERR': {0: 'None', 1: 'None', 2: 'None'},
 'F0.5': {0: 'None', 1: 'None', 2: 'None'},
 'F1': {0: 'None', 1: 'None', 2: 'None'},
 'F2': {0: 'None', 1: 'None', 2: 'None'},
 'FDR': {0: 'None', 1: 'None', 2: 'None'},
 'FN': {0: 'None', 1: 'None', 2: 'None'},
 'FNR': {0: 'None', 1: 'None', 2: 'None'},
 'FOR': {0: 'None', 1: 'None', 2: 'None'},
 'FP': {0: 'None', 1: 'None', 2: 'None'},
 'FPR': {0: 'None', 1: 'None', 2: 'None'},
 'G': {0: 'None', 1: 'None', 2: 'None'},
 'GI': {0: 'None', 1: 'None', 2: 'None'},
 'GM': {0: 'None', 1: 'None', 2: 'None'},
 'HD': {0: 'None', 1: 'None', 2: 'None'},
 'IBA': {0: 'None', 1: 'None', 2: 'None'},
 'ICSI': {0: 'None', 1: 'None', 2: 'None'},
 'IS': {0: 'None', 1: 'None', 2: 'None'},
 'J': {0: 'None', 1: 'None', 2: 'None'},
 'LS': {0: 'None', 1: 'None', 2: 'None'},
 'MCC': {0: 'None', 1: 'None', 2: 'None'},
 'MCCI': {0: 'None', 1: 'None', 2: 'None'},
 'MCEN': {0: 'None', 1: 'None', 2: 'None'},
 'MK': {0: 'None', 1: 'None', 2: 'None'},
 'N': {0: 'None', 1: 'None', 2: 'None'},
 'NLR': {0: 'None', 1: 'None', 2: 'None'},
 'NLRI': {0: 'None', 1: 'None', 2: 'None'},
 'NPV': {0: 'None', 1: 'None', 2: 'None'},
 'OC': {0: 'None', 1: 'None', 2: 'None'},
 'OOC': {0: 'None', 1: 'None', 2: 'None'},
 'OP': {0: 'None', 1: 'None', 2: 'None'},
 'P': {0: 'None', 1: 'None', 2: 'None'},
 'PLR': {0: 'None', 1: 'None', 2: 'None'},
 'PLRI': {0: 'None', 1: 'None', 2: 'None'},
 'POP': {0: 'None', 1: 'None', 2: 'None'},
 'PPV': {0: 'None', 1: 'None', 2: 'None'},
 'PR': {0: 'None', 1: 'None', 2: 'None'},
 'PRE': {0: 'None', 1: 'None', 2: 'None'},
 'Q': {0: 'None', 1: 'None', 2: 'None'},
 'QI': {0: 'None', 1: 'None', 2: 'None'},
 'RACC': {0: 'None', 1: 'None', 2: 'None'},
 'RACCU': {0: 'None', 1: 'None', 2: 'None'},
 'TN': {0: 'None', 1: 'None', 2: 'None'},
 'TNR': {0: 'None', 1: 'None', 2: 'None'},
 'TON': {0: 'None', 1: 'None', 2: 'None'},
 'TOP': {0: 'None', 1: 'None', 2: 'None'},
 'TOPR': {0: 'None', 1: 'None', 2: 'None'},
 'TP': {0: 'None', 1: 'None', 2: 'None'},
 'TPR': {0: 'None', 1: 'None', 2: 'None'},
 'Y': {0: 'None', 1: 'None', 2: 'None'},
 'dInd': {0: 'None', 1: 'None', 2: 'None'},
 'sInd': {0: 'None', 1: 'None', 2: 'None'}}

cm3.overall_stat

{'95% CI': 'None',
 'ACC Macro': 'None',
 'ARI': 'None',
 'AUNP': 'None',
 'AUNU': 'None',
 'Bangdiwala B': 'None',
 'Bennett S': 'None',
 'CBA': 'None',
 'CSI': 'None',
 'Chi-Squared': 'None',
 'Chi-Squared DF': 'None',
 'Conditional Entropy': 'None',
 'Cramer V': 'None',
 'Cross Entropy': 'None',
 'F1 Macro': 'None',
 'F1 Micro': 'None',
 'FNR Macro': 'None',
 'FNR Micro': 'None',
 'FPR Macro': 'None',
 'FPR Micro': 'None',
 'Gwet AC1': 'None',
 'Hamming Loss': 'None',
 'Joint Entropy': 'None',
 'KL Divergence': 'None',
 'Kappa': 'None',
 'Kappa 95% CI': 'None',
 'Kappa No Prevalence': 'None',
 'Kappa Standard Error': 'None',
 'Kappa Unbiased': 'None',
 'Krippendorff Alpha': 'None',
 'Lambda A': 'None',
 'Lambda B': 'None',
 'Mutual Information': 'None',
 'NIR': 'None',
 'NPV Macro': 'None',
 'NPV Micro': 'None',
 'Overall ACC': 'None',
 'Overall CEN': 'None',
 'Overall J': 'None',
 'Overall MCC': 'None',
 'Overall MCEN': 'None',
 'Overall RACC': 'None',
 'Overall RACCU': 'None',
 'P-Value': 'None',
 'PPV Macro': 'None',
 'PPV Micro': 'None',
 'Pearson C': 'None',
 'Phi-Squared': 'None',
 'RCI': 'None',
 'RR': 'None',
 'Reference Entropy': 'None',
 'Response Entropy': 'None',
 'SOA1(Landis & Koch)': 'None',
 'SOA10(Pearson C)': 'None',
 'SOA2(Fleiss)': 'None',
 'SOA3(Altman)': 'None',
 'SOA4(Cicchetti)': 'None',
 'SOA5(Cramer)': 'None',
 'SOA6(Matthews)': 'None',
 'SOA7(Lambda A)': 'None',
 'SOA8(Lambda B)': 'None',
 'SOA9(Krippendorff Alpha)': 'None',
 'Scott PI': 'None',
 'Standard Error': 'None',
 'TNR Macro': 'None',
 'TNR Micro': 'None',
 'TPR Macro': 'None',
 'TPR Micro': 'None',
 'Zero-one Loss': 'None'}

Notice : new in version 3.9

Relabel¶

relabel method is added in version 1.5 in order to change ConfusionMatrix class names.

cm.relabel(mapping={0:"L1",1:"L2",2:"L3"}, sort=True)

cm

pycm.ConfusionMatrix(classes: ['L1', 'L2', 'L3'])

Parameters¶

mapping : mapping dictionary (type : dict)
sort : flag for sorting new classes (type : bool, default : False)

Notice : new in version 1.5

Notice : sort added in version 3.9

Position¶

position method is added in version 2.8 in order to find the indexes of observations in predict_vector which made TP, TN, FP, FN.

cm3 = ConfusionMatrix(y_actu, y_pred, digit=5)
cm3.position()

{0: {'FN': [], 'FP': [0, 7], 'TN': [2, 3, 5, 6, 8, 10, 11], 'TP': [1, 4, 9]},
 1: {'FN': [5, 10], 'FP': [3], 'TN': [0, 1, 2, 4, 7, 8, 9, 11], 'TP': [6]},
 2: {'FN': [0, 3, 7], 'FP': [5, 10], 'TN': [1, 4, 6, 9], 'TP': [2, 8, 11]}}

Notice : new in version 2.8

Notice : only works in vector mode

To array¶

to_array method is added in version 2.9 in order to returns the confusion matrix in the form of a NumPy array. This can be helpful to apply different operations over the confusion matrix for different purposes such as aggregation, normalization, and combination.

cm.to_array()

array([[3, 0, 2],
       [0, 1, 1],
       [0, 2, 3]])

cm.to_array(normalized=True)

array([[0.6, 0. , 0.4],
       [0. , 0.5, 0.5],
       [0. , 0.4, 0.6]])

cm.to_array(normalized=True, one_vs_all=True, class_name="L1")

array([[0.6, 0.4],
       [0. , 1. ]])

Parameters¶

normalized : a flag for getting normalized confusion matrix (type : bool, default : False)
one_vs_all : one-vs-all mode flag (type : bool, default : False)
class_name : target class name for one-vs-all mode (type : any valid type, default : None)

Output¶

Confusion Matrix in NumPy array format

Notice : new in version 2.9

Combine¶

combine method is added in version 3.0 in order to merge two confusion matrices. This option will be useful in mini-batch learning.

cm_combined = cm2.combine(cm3)
cm_combined.print_matrix()

Predict 0       1       2       
Actual
0       6       0       0       

1       0       2       4       

2       4       2       6

Parameters¶

other : the other matrix that is going to be combined (type : ConfusionMatrix)

Output¶

New ConfusionMatrix

Notice : new in version 3.0

Plot¶

plot method is added in version 3.0 in order to plot a confusion matrix using Matplotlib or Seaborn.

import sys
!{sys.executable}  -m pip -q -q install matplotlib;
!{sys.executable}  -m pip -q -q install seaborn;
import matplotlib.pyplot as plt

Keyring is skipped due to an exception: invalid syntax (core.py, line 48)
Keyring is skipped due to an exception: invalid syntax (core.py, line 48)

cm.plot()

<matplotlib.axes._subplots.AxesSubplot at 0x4744310>

cm.plot(cmap=plt.cm.Greens, number_label=True, normalized=True)

<matplotlib.axes._subplots.AxesSubplot at 0x12812e90>

cm.plot(plot_lib="seaborn", number_label=True)

<matplotlib.axes._subplots.AxesSubplot at 0x1284e4b0>

cm.plot(cmap=plt.cm.Blues, number_label=True, one_vs_all=True, class_name="L1")

<matplotlib.axes._subplots.AxesSubplot at 0x1bf28c10>

cm.plot(cmap=plt.cm.Reds, number_label=True, normalized=True, one_vs_all=True, class_name="L3")

<matplotlib.axes._subplots.AxesSubplot at 0x12823210>

Parameters¶

normalized :normalized flag for matrix (type : bool, default : False)
one_vs_all : one-vs-all mode flag (type : bool, default : False)
class_name : target class name for one-vs-all mode (type : any valid type, default : None)
title : plot title (type : str, default : Confusion Matrix)
number_label : number label flag (type : bool, default : False)
cmap : color map (type : matplotlib.colors.ListedColormap, default : None)
plot_lib : plotting library (type : str, default : matplotlib)

Output¶

Plot axes

Notice : new in version 3.0

Distance/Similarity¶

Notice : new in version 3.8

Dissimilarity matrix¶

A dissimilarity matrix, also known as a distance matrix, is a square, symmetric matrix that represents the pairwise dissimilarities (or distances) between a set of objects. Dissimilarity matrices are used to quantify how different or dissimilar objects are from each other [87].

dissimilarity_matrix method is added in version 4.3 in order to calculate dissimilarity matrix based on the input confusion matrix.

cm.dissimilarity_matrix()

{'L1': {'L1': 0, 'L2': 5, 'L3': 6},
 'L2': {'L1': 5, 'L2': 0, 'L3': 3},
 'L3': {'L1': 6, 'L2': 3, 'L3': 0}}

Notice : new in version 4.3

Parameter recommender¶

This option has been added in version 1.9 to recommend the most related parameters considering the characteristics of the input dataset. The suggested parameters are selected according to some characteristics of the input such as being balance/imbalance and binary/multi-class. All suggestions can be categorized into three main groups: imbalanced dataset, binary classification for a balanced dataset, and multi-class classification for a balanced dataset. The recommendation lists have been gathered according to the respective paper of each parameter and the capabilities which had been claimed by the paper.

Fig2. Parameter Recommender Block Diagram

For determining if the dataset is imbalanced, we use the following strategy:

$$R=\frac{Max(P)}{Min(P)}$$

$$State=\begin{cases}Balance & R\leq 3\\Imbalance & R > 3\end{cases}$$

cm.imbalance

False

cm.binary

False

cm.recommended_list

['ERR',
 'TPR Micro',
 'TPR Macro',
 'F1 Macro',
 'PPV Macro',
 'NPV Macro',
 'ACC',
 'Overall ACC',
 'MCC',
 'MCCI',
 'Overall MCC',
 'SOA6(Matthews)',
 'BCD',
 'Hamming Loss',
 'Zero-one Loss']

is_imbalanced parameter has been added in version 3.3, so the user can indicate whether the concerned dataset is imbalanced or not. As long as the user does not provide any information in this regard, the automatic detection algorithm will be used.

cm4 = ConfusionMatrix(y_actu, y_pred, is_imbalanced=True)
cm4.imbalance

True

cm4 = ConfusionMatrix(y_actu, y_pred, is_imbalanced=False)
cm4.imbalance

False

Notice : also available in HTML report

Notice : The recommender system assumes that the input is the result of classification over the whole data rather than just a part of it. If the confusion matrix is the result of test data classification, the recommendation is not valid.

Notice : is_imbalanced , new in version 3.3

Compare¶

In version 2.0, a method for comparing several confusion matrices is introduced. This option is a combination of several overall and class-based benchmarks. Each of the benchmarks evaluates the performance of the classification algorithm from good to poor and give them a numeric score. The score of good and poor performances are 1 and 0, respectively.

After that, two scores are calculated for each confusion matrices, overall and class-based. The overall score is the average of the score of seven overall benchmarks which are Landis & Koch, Cramer, Matthews, Goodman-Kruskal's Lambda A, Goodman-Kruskal's Lambda B, Krippendorff's Alpha, and Pearson's C. In the same manner, the class-based score is the average of the score of six class-based benchmarks which are Positive Likelihood Ratio Interpretation, Negative Likelihood Ratio Interpretation, Discriminant Power Interpretation, AUC value Interpretation, Matthews Correlation Coefficient Interpretation and Yule's Q Interpretation. It should be noticed that if one of the benchmarks returns none for one of the classes, that benchmarks will be eliminated in total averaging. If the user sets weights for the classes, the averaging over the value of class-based benchmark scores will transform to a weighted average.

If the user sets the value of by_class boolean input True, the best confusion matrix is the one with the maximum class-based score. Otherwise, if a confusion matrix obtains the maximum of both overall and class-based scores, that will be reported as the best confusion matrix, but in any other case, the compared object doesn’t select the best confusion matrix.

Fig3. Compare Block Diagram

This is how the overall and class-based scores are determined for each confusion matrix. Note that here $|Set|$ shows the cardinality of the set and the cardinality of each benchmark is equal to the maximum possible score for that benchmark.

$$S_{Overall}=\sum_{i=1}^{|B_O|}\frac{W_{OB}(i)}{\sum W_{OB}}\times\frac{R_O(i)}{|B_O(i)|}$$

$$S_{Class}=\sum_{i=1}^{|B_C|}\sum_{j=1}^{|C|}\frac{W_{CB}(i)}{\sum W_{CB}}\times\frac{W_C(j)}{\sum W_{C}}\times\frac{R_C(i,j)}{|B_C(i)|}$$

$$0\leq S_{Overall},S_{Class}\leq1$$

$B_C$ : Class benchmarks
$B_O$ : Overall benchmarks
$C$ : Classes
$W_{CB}$ : Class benchmark weights
$W_{OB}$ : Overall benchmark weights
$W_{C}$ : Class weights
$R_C$ : Class benchmark result
$R_O$ : Overall benchmark result

cm2 = ConfusionMatrix(matrix={0: {0:2, 1:50, 2:6}, 1:{0:5, 1:50, 2:3}, 2:{0:1, 1:7, 2:50}})
cm3 = ConfusionMatrix(matrix={0: {0:50, 1:2, 2:6}, 1:{0:50, 1:5, 2:3}, 2:{0:1, 1:55, 2:2}})

cp = Compare({"cm2":cm2, "cm3":cm3})

print(cp)

Best : cm2

Rank  Name   Class-Score       Overall-Score
1     cm2    0.50278           0.58095
2     cm3    0.33611           0.52857

cp.scores

{'cm2': {'class': 0.50278, 'overall': 0.58095},
 'cm3': {'class': 0.33611, 'overall': 0.52857}}

cp.sorted

['cm2', 'cm3']

cp.best

pycm.ConfusionMatrix(classes: [0, 1, 2])

cp.best_name

'cm2'

cp2 = Compare({"cm2":cm2, "cm3":cm3}, by_class=True, class_weight={0:5, 1:1, 2:1})

print(cp2)

Best : cm3

Rank  Name   Class-Score       Overall-Score
1     cm3    0.45357           0.52857
2     cm2    0.34881           0.58095

cp3 = Compare({"cm2":cm2, "cm3":cm3}, class_benchmark_weight={"PLRI":0, "NLRI":0, "DPI":0, "AUCI":1, "MCCI":0, "QI":0})

print(cp3)

Best : cm2

Rank  Name   Class-Score       Overall-Score
1     cm2    0.46667           0.58095
2     cm3    0.33333           0.52857

cp4 = Compare(
    {"cm2":cm2, "cm3":cm3},
    overall_benchmark_weight={"SOA1":1, "SOA2":0, "SOA3":0, "SOA4":0, "SOA5":0, "SOA6":1, "SOA7":0, "SOA8":0, "SOA9":0, "SOA10":0})

print(cp4)

Best : cm2

Rank  Name   Class-Score       Overall-Score
1     cm2    0.50278           0.45
2     cm3    0.33611           0.18333

Overall and class benchmark lists are available in CLASS_BENCHMARK_LIST and OVERALL_BENCHMARK_LIST

from pycm import CLASS_BENCHMARK_LIST, OVERALL_BENCHMARK_LIST
print(CLASS_BENCHMARK_LIST)
print(OVERALL_BENCHMARK_LIST)

['AUCI', 'DPI', 'MCCI', 'NLRI', 'PLRI', 'QI']
['SOA1', 'SOA10', 'SOA2', 'SOA3', 'SOA4', 'SOA5', 'SOA6', 'SOA7', 'SOA8', 'SOA9']

Notice : overall_benchmark_weight and class_benchmark_weight, new in version 3.3

Notice : From version 3.8, Goodman-Kruskal's Lambda A, Goodman-Kruskal's Lambda B, Krippendorff's Alpha, and Pearson's C benchmarks are considered in the overall score and default weights of the overall benchmarks are modified accordingly.

ROC curve¶

ROCCurve, added in version 3.7, is devised to compute the Receiver Operating Characteristic (ROC) or simply ROC curve. In ROC curves, the Y axis represents the True Positive Rate, and the X axis represents the False Positive Rate. Thus, the ideal point is located at the top left of the curve, and a larger area under the curve represents better performance. ROC curve is a graphical representation of binary classifiers' performance. In PyCM, ROCCurve binarizes the output based on the "One vs. Rest" strategy to provide an extension of ROC for multi-class classifiers. By getting the actual labels vector and the target probability estimates of the positive classes, this method is able to compute and plot TPR-FPR pairs for different discrimination thresholds and compute the area under the ROC curve. The thresholds for which the TPR-FPR pairs are calculated can be either specified by users (by setting thresholds input) or calculated automatically. Furthermore, sample weights can be adjusted via sample_weight as an input; otherwise, they are assumed to be 1. ROCCurve has two methods named area() and plot(). area() provides the user with the value of area under curve, which can be calculated using either trapezoidal (default method) or midpoint numerical integral technique. plot() is also provided to plot the given curve.

from pycm import ROCCurve
crv = ROCCurve(
    actual_vector=numpy.array([1, 1, 2, 2]),
    probs=numpy.array([[0.1, 0.9], [0.4, 0.6], [0.35, 0.65], [0.8, 0.2]]),
    classes=[2, 1])
crv.thresholds
auc_trp = crv.area()
auc_trp[1]
auc_trp[2]

0.75

crv.plot(area=True, classes=[2])

<matplotlib.axes._subplots.AxesSubplot at 0x1cf668f0>

Notice : new in version 3.7

Precision-Recall curve¶

PRCurve, added in version 3.7, is devised to compute the Precision-Recall curve in which the Y axis represents the Precision, and the X axis represents the Recall of a classifier. Thus, the ideal point is located at the top right of the curve, and a larger area under the curve represents better performance. Precision-Recall curve is a graphical representation of binary classifiers' performance. In PyCM, PRCurve binarizes the output based on the "One vs. Rest" strategy to provide an extension of this curve for multi-class classifiers. By getting the actual labels vector and the target probability estimates of the positive classes, this method is able to compute and plot Precision-Recall pairs for different discrimination thresholds and compute the area under the curve. The thresholds for which the Precision-Recall pairs are calculated can be either specified by users (by setting thresholds input) or calculated automatically. Furthermore, sample weights can be adjusted via sample_weight as an input; otherwise, they are assumed to be 1. PRCurve has two methods named area() and plot(). area() provides the user with the value of area under curve, which can be calculated using either trapezoidal (default method) or midpoint numerical integral technique. plot() is also provided to plot the given curve.

from pycm import PRCurve
crv = PRCurve(
    actual_vector=numpy.array([1, 1, 2, 2]),
    probs=numpy.array([[0.1, 0.9], [0.4, 0.6], [0.35, 0.65], [0.8, 0.2]]),
    classes=[2, 1])
crv.thresholds
auc_trp = crv.area()
auc_trp[1]
auc_trp[2]

C:\Users\Sepkjaer\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pycm-4.4-py3.5.egg\pycm\curve.py:382: RuntimeWarning: The curve contains non-numerical value(s).

0.29166666666666663

crv.plot(area=True, classes=[2])

<matplotlib.axes._subplots.AxesSubplot at 0x1cf9dbd0>

Notice : new in version 3.7

Multilabel confusion matrix¶

From version 4.0, MultiLabelCM has been added to calculate class-wise or sample-wise multilabel confusion matrices. In class-wise mode, confusion matrices are calculated for each class, and in sample-wise mode, they are generated per sample. All generated confusion matrices are binarized with a one-vs-rest transformation.

mlcm = MultiLabelCM(actual_vector=[{"cat", "bird"}, {"dog"}], 
                    predict_vector=[{"cat"}, {"dog", "bird"}], 
                    classes=["cat", "dog", "bird"])
print(mlcm.actual_vector_multihot)
print(mlcm.predict_vector_multihot)
mlcm.get_cm_by_class("cat").print_matrix()
mlcm.get_cm_by_sample(0).print_matrix()

[[1, 0, 1], [0, 1, 0]]
[[1, 0, 0], [0, 1, 1]]
Predict 0       1       
Actual
0       1       0       

1       0       1       


Predict 0       1       
Actual
0       1       0       

1       1       1

Notice : new in version 4.0

Online help¶

online_help function is added in version 1.1 in order to open each statistics definition in web browser.

>>> from pycm import online_help
>>> online_help("J")
>>> online_help("J", alt_link=True)
>>> online_help("SOA1(Landis & Koch)")
>>> online_help(2)

List of items are available by calling online_help() (without argument)

If PyCM website is not available, set alt_link = True

online_help()

Please choose one parameter : 

Example : online_help("J") or online_help(2)

1-95% CI
2-ACC
3-ACC Macro
4-AGF
5-AGM
6-AM
7-ARI
8-AUC
9-AUCI
10-AUNP
11-AUNU
12-AUPR
13-BB
14-BCD
15-BM
16-Bangdiwala B
17-Bennett S
18-CBA
19-CEN
20-CSI
21-Chi-Squared
22-Chi-Squared DF
23-Conditional Entropy
24-Cramer V
25-Cross Entropy
26-DOR
27-DP
28-DPI
29-ERR
30-F0.5
31-F1
32-F1 Macro
33-F1 Micro
34-F2
35-FDR
36-FN
37-FNR
38-FNR Macro
39-FNR Micro
40-FOR
41-FP
42-FPR
43-FPR Macro
44-FPR Micro
45-G
46-GI
47-GM
48-Gwet AC1
49-HD
50-Hamming Loss
51-IBA
52-ICSI
53-IS
54-J
55-Joint Entropy
56-KL Divergence
57-Kappa
58-Kappa 95% CI
59-Kappa No Prevalence
60-Kappa Standard Error
61-Kappa Unbiased
62-Krippendorff Alpha
63-LS
64-Lambda A
65-Lambda B
66-MCC
67-MCCI
68-MCEN
69-MK
70-Mutual Information
71-N
72-NIR
73-NLR
74-NLRI
75-NPV
76-NPV Macro
77-NPV Micro
78-OC
79-OOC
80-OP
81-Overall ACC
82-Overall CEN
83-Overall J
84-Overall MCC
85-Overall MCEN
86-Overall RACC
87-Overall RACCU
88-P
89-P-Value
90-PLR
91-PLRI
92-POP
93-PPV
94-PPV Macro
95-PPV Micro
96-PR
97-PRE
98-Pearson C
99-Phi-Squared
100-Q
101-QI
102-RACC
103-RACCU
104-RCI
105-RR
106-Reference Entropy
107-Response Entropy
108-SOA1(Landis & Koch)
109-SOA2(Fleiss)
110-SOA3(Altman)
111-SOA4(Cicchetti)
112-SOA5(Cramer)
113-SOA6(Matthews)
114-SOA7(Lambda A)
115-SOA8(Lambda B)
116-SOA9(Krippendorff Alpha)
117-SOA10(Pearson C)
118-Scott PI
119-Standard Error
120-TN
121-TNR
122-TNR Macro
123-TNR Micro
124-TON
125-TOP
126-TOPR
127-TP
128-TPR
129-TPR Macro
130-TPR Micro
131-Y
132-Zero-one Loss
133-dInd
134-sInd

Parameters¶

param : input parameter (type : int or str, default : None)
alt_link : alternative link for document flag (type : bool, default : False)

Notice : alt_link , new in version 2.4

Acceptable data types¶

ConfusionMatrix

actual_vector : python list or numpy array of any stringable objects
predict_vector : python list or numpy array of any stringable objects
matrix : dict
digit: int
threshold : FunctionType (function or lambda)
file : File object
sample_weight : python list or numpy array of numbers
transpose : bool
classes : python list
is_imbalanced : bool
metrics_off : bool

run help(ConfusionMatrix) for more information

Notice : metrics_off, new in version 3.9

Compare

cm_dict : python dict of ConfusionMatrix object (str : ConfusionMatrix)
by_class : bool
class_weight : python dict of class weights (class_name : float)
class_benchmark_weight: python dict of class benchmark weights (class_benchmark_name : float)
overall_benchmark_weight: python dict of overall benchmark weights (overall_benchmark_name : float)
digit: int

run help(Compare) for more information

Notice : weight renamed to class_weight in version 3.3

Notice : overall_benchmark_weight and class_benchmark_weight, new in version 3.3

ROCCurve

actual_vector : python list or numpy array of any stringable objects
probs : python list or numpy array
classes : python list
thresholds: python list or numpy array
sample_weight: python list or numpy array

run help(ROCCurve) for more information

PRCurve

actual_vector : python list or numpy array of any stringable objects
probs : python list or numpy array
classes : python list
thresholds: python list or numpy array
sample_weight: python list or numpy array

run help(PRCurve) for more information

MultiLabelCM

actual_vector : python list or numpy array of sets
predict_vector : python list or numpy array of sets
sample_weight: python list or numpy array
classes : python list

run help(MultiLabelCM) for more information

Basic parameters¶

TP (True positive)¶

A true positive test result is one that detects the condition when the condition is present (correctly identified) [3].

cm.TP

{'L1': 3, 'L2': 1, 'L3': 3}

TN (True negative)¶

A true negative test result is one that does not detect the condition when the condition is absent (correctly rejected) [3].

cm.TN

{'L1': 7, 'L2': 8, 'L3': 4}

FP (False positive)¶

A false positive test result is one that detects the condition when the condition is absent (incorrectly identified) [3].

cm.FP

{'L1': 0, 'L2': 2, 'L3': 3}

FN (False negative)¶

A false negative test result is one that does not detect the condition when the condition is present (incorrectly rejected) [3].

cm.FN

{'L1': 2, 'L2': 1, 'L3': 2}

P (Condition positive)¶

Number of positive samples. Also known as support (the number of occurrences of each class in y_true) [3].

$$P=TP+FN$$

cm.P

{'L1': 5, 'L2': 2, 'L3': 5}

N (Condition negative)¶

Number of negative samples [3].

$$N=TN+FP$$

cm.N

{'L1': 7, 'L2': 10, 'L3': 7}

TOP (Test outcome positive)¶

Number of positive outcomes [3].

$$TOP=TP+FP$$

cm.TOP

{'L1': 3, 'L2': 3, 'L3': 6}

TON (Test outcome negative)¶

Number of negative outcomes [3].

$$TON=TN+FN$$

cm.TON

{'L1': 9, 'L2': 9, 'L3': 6}

POP (Population)¶

Total sample size [3].

$$POP=TP+TN+FN+FP$$

cm.POP

{'L1': 12, 'L2': 12, 'L3': 12}

Wikipedia page

Class statistics¶

TPR (True positive rate)¶

Sensitivity (also called the true positive rate, the recall, or probability of detection in some fields) measures the proportion of positives that are correctly identified as such (e.g. the percentage of sick people who are correctly identified as having the condition) [3].

Wikipedia page

$$TPR=\frac{TP}{P}=\frac{TP}{TP+FN}$$

cm.TPR

{'L1': 0.6, 'L2': 0.5, 'L3': 0.6}

TNR (True negative rate)¶

Specificity (also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g. the percentage of healthy people who are correctly identified as not having the condition) [3].

Wikipedia page

$$TNR=\frac{TN}{N}=\frac{TN}{TN+FP}$$

cm.TNR

{'L1': 1.0, 'L2': 0.8, 'L3': 0.5714285714285714}

PPV (Positive predictive value)¶

Positive predictive value (PPV) is the proportion of positives that correspond to the presence of the condition [3].

Wikipedia page

$$PPV=\frac{TP}{TP+FP}$$

cm.PPV

{'L1': 1.0, 'L2': 0.3333333333333333, 'L3': 0.5}

NPV (Negative predictive value)¶

Negative predictive value (NPV) is the proportion of negatives that correspond to the absence of the condition [3].

Wikipedia page

$$NPV=\frac{TN}{TN+FN}$$

cm.NPV

{'L1': 0.7777777777777778, 'L2': 0.8888888888888888, 'L3': 0.6666666666666666}

FNR (False negative rate)¶

The false negative rate is the proportion of positives which yield negative test outcomes with the test, i.e., the conditional probability of a negative test result given that the condition being looked for is present [3].

Wikipedia page

$$FNR=\frac{FN}{P}=\frac{FN}{FN+TP}=1-TPR$$

cm.FNR

{'L1': 0.4, 'L2': 0.5, 'L3': 0.4}

FPR (False positive rate)¶

The false positive rate is the proportion of all negatives that still yield positive test outcomes, i.e., the conditional probability of a positive test result given an event that was not present [3].

The false positive rate is equal to the significance level. The specificity of the test is equal to $ 1 $ minus the false positive rate.

Wikipedia page

$$FPR=\frac{FP}{N}=\frac{FP}{FP+TN}=1-TNR$$

cm.FPR

{'L1': 0.0, 'L2': 0.19999999999999996, 'L3': 0.4285714285714286}

FDR (False discovery rate)¶

The false discovery rate (FDR) is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the expected proportion of "discoveries" (rejected null hypotheses) that are false (incorrect rejections) [3].

Wikipedia page

$$FDR=\frac{FP}{FP+TP}=1-PPV$$

cm.FDR

{'L1': 0.0, 'L2': 0.6666666666666667, 'L3': 0.5}

FOR (False omission rate)¶

False omission rate (FOR) is a statistical method used in multiple hypothesis testing to correct for multiple comparisons and it is the complement of the negative predictive value. It measures the proportion of false negatives which are incorrectly rejected [3].

Wikipedia page

$$FOR=\frac{FN}{FN+TN}=1-NPV$$

cm.FOR

{'L1': 0.2222222222222222,
 'L2': 0.11111111111111116,
 'L3': 0.33333333333333337}

ACC (Accuracy)¶

The accuracy is the number of correct predictions from all predictions made [3].

Wikipedia page

$$ACC=\frac{TP+TN}{P+N}=\frac{TP+TN}{TP+TN+FP+FN}$$

cm.ACC

{'L1': 0.8333333333333334, 'L2': 0.75, 'L3': 0.5833333333333334}

ERR (Error rate)¶

The error rate is the number of incorrect predictions from all predictions made [3].

$$ERR=\frac{FP+FN}{P+N}=\frac{FP+FN}{TP+TN+FP+FN}=1-ACC$$

cm.ERR

{'L1': 0.16666666666666663, 'L2': 0.25, 'L3': 0.41666666666666663}

Notice : new in version 0.4

FBeta-Score¶

In statistical analysis of classification, the F1 score (also F-score or F-measure) is a measure of a test's accuracy. It considers both the precision $ p $ and the recall $ r $ of the test to compute the score. The F1 score is the harmonic average of the precision and recall, where F1 score reaches its best value at $ 1 $ (perfect precision and recall) and worst at $ 0 $ [3].

Wikipedia page

$$F_{\beta}=(1+\beta^2)\times \frac{PPV\times TPR}{(\beta^2 \times PPV)+TPR}=\frac{(1+\beta^2) \times TP}{(1+\beta^2)\times TP+FP+\beta^2 \times FN}$$

cm.F1

{'L1': 0.75, 'L2': 0.4, 'L3': 0.5454545454545454}

cm.F05

{'L1': 0.8823529411764706, 'L2': 0.35714285714285715, 'L3': 0.5172413793103449}

cm.F2

{'L1': 0.6521739130434783, 'L2': 0.45454545454545453, 'L3': 0.5769230769230769}

cm.F_beta(beta=4)

{'L1': 0.6144578313253012, 'L2': 0.4857142857142857, 'L3': 0.5930232558139535}

Parameters¶

beta : beta parameter (type : float)

Output¶

{class1: FBeta-Score1, class2: FBeta-Score2, ...}

Notice : new in version 0.4

MCC (Matthews correlation coefficient)¶

The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications, introduced by biochemist Brian W. Matthews in 1975. It takes into account true and false positives and negatives and is generally regarded as a balanced measure that can be used even if the classes are of very different sizes. The MCC is, in essence, a correlation coefficient between the observed and predicted binary classifications; it returns a value between $ −1 $ and $ +1 $. A coefficient of $ +1 $ represents a perfect prediction, $ 0 $ no better than random prediction and $ −1 $ indicates total disagreement between prediction and observation [27].

Interpretation

Wikipedia page

$$MCC=\frac{TP \times TN-FP \times FN}{\sqrt{(TP+FP)\times (TP+FN)\times (TN+FP)\times (TN+FN)}}$$

cm.MCC

{'L1': 0.6831300510639732, 'L2': 0.25819888974716115, 'L3': 0.1690308509457033}

BM (Bookmaker informedness)¶

The informedness of a prediction method as captured by a contingency matrix is defined as the probability that the prediction method will make a correct decision as opposed to guessing and is calculated using the bookmaker algorithm [2].

Equals to Youden Index

$$BM=TPR+TNR-1$$

cm.BM

{'L1': 0.6000000000000001,
 'L2': 0.30000000000000004,
 'L3': 0.17142857142857126}

MK (Markedness)¶

In statistics and psychology, the social science concept of markedness is quantified as a measure of how much one variable is marked as a predictor or possible cause of another and is also known as $ \triangle P $ in simple two-choice cases [2].

$$MK=PPV+NPV-1$$

cm.MK

{'L1': 0.7777777777777777, 'L2': 0.2222222222222221, 'L3': 0.16666666666666652}

PLR (Positive likelihood ratio)¶

Likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition (such as a disease state) exists. The first description of the use of likelihood ratios for decision rules was made at a symposium on information theory in 1954 [28].

Interpretation

Wikipedia page

$$LR_+=PLR=\frac{TPR}{FPR}$$

cm.PLR

{'L1': 'None', 'L2': 2.5000000000000004, 'L3': 1.4}

Notice : LR+ renamed to PLR in version 1.5

NLR (Negative likelihood ratio)¶

Likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition (such as a disease state) exists. The first description of the use of likelihood ratios for decision rules was made at a symposium on information theory in 1954 [28].

Interpretation

Wikipedia page

$$LR_-=NLR=\frac{FNR}{TNR}$$

cm.NLR

{'L1': 0.4, 'L2': 0.625, 'L3': 0.7000000000000001}

Notice : LR- renamed to NLR in version 1.5

DOR (Diagnostic odds ratio)¶

The diagnostic odds ratio is a measure of the effectiveness of a diagnostic test. It is defined as the ratio of the odds of the test being positive if the subject has a disease relative to the odds of the test being positive if the subject does not have the disease [28].

Wikipedia page

$$DOR=\frac{LR_+}{LR_-}$$

cm.DOR

{'L1': 'None', 'L2': 4.000000000000001, 'L3': 1.9999999999999998}

PRE (Prevalence)¶

Prevalence is a statistical concept referring to the number of cases of a disease that are present in a particular population at a given time (Reference Likelihood) [14].

Wikipedia page

$$Prevalence=\frac{P}{POP}$$

cm.PRE

{'L1': 0.4166666666666667, 'L2': 0.16666666666666666, 'L3': 0.4166666666666667}

G (G-measure)¶

The geometric mean of precision and sensitivity, also known as Fowlkes–Mallows index [3].

Wikipedia page

$$G=\sqrt{PPV\times TPR}$$

cm.G

{'L1': 0.7745966692414834, 'L2': 0.408248290463863, 'L3': 0.5477225575051661}

RACC (Random accuracy)¶

The expected accuracy from a strategy of randomly guessing categories according to reference and response distributions [24].

$$RACC=\frac{TOP \times P}{POP^2}$$

cm.RACC

{'L1': 0.10416666666666667,
 'L2': 0.041666666666666664,
 'L3': 0.20833333333333334}

Notice : new in version 0.3

RACCU (Random accuracy unbiased)¶

The expected accuracy from a strategy of randomly guessing categories according to the average of the reference and response distributions [25].

$$RACCU=(\frac{TOP+P}{2 \times POP})^2$$

cm.RACCU

{'L1': 0.1111111111111111,
 'L2': 0.04340277777777778,
 'L3': 0.21006944444444442}

Notice : new in version 0.8.1

J (Jaccard index)¶

The Jaccard index, also known as Intersection over Union and the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets [29].

Wikipedia page

Some articles also named it as the F* (An Interpretable Transformation of the F-measure) [77].

$$J=\frac{TP}{TOP+P-TP}$$

cm.J

{'L1': 0.6, 'L2': 0.25, 'L3': 0.375}

Notice : new in version 0.9

IS (Information score)¶

The amount of information needed to correctly classify an example into class C, whose prior probability is $ p(C) $, is defined as $ -\log_2(p(C)) $ [18] [39].

$$IS=-log_2(\frac{TP+FN}{POP})+log_2(\frac{TP}{TP+FP})$$

cm.IS

{'L1': 1.2630344058337937, 'L2': 0.9999999999999998, 'L3': 0.26303440583379367}

Notice : new in version 1.3

CEN (Confusion entropy)¶

CEN based upon the concept of entropy for evaluating classifier performances. By exploiting the misclassification information of confusion matrices, the measure evaluates the confusion level of the class distribution of misclassified samples. Both theoretical analysis and statistical results show that the proposed measure is more discriminating than accuracy and RCI while it remains relatively consistent with the two measures. Moreover, it is more capable of measuring how the samples of different classes have been separated from each other. Hence the proposed measure is more precise than the two measures and can substitute for them to evaluate classifiers in classification applications [17].

$$P_{i,j}^{j}=\frac{Matrix(i,j)}{\sum_{k=1}^{|C|}\Big(Matrix(j,k)+Matrix(k,j)\Big)}$$

$$P_{i,j}^{i}=\frac{Matrix(i,j)}{\sum_{k=1}^{|C|}\Big(Matrix(i,k)+Matrix(k,i)\Big)}$$

$$CEN_j=-\sum_{k=1,k\neq j}^{|C|}\Bigg(P_{j,k}^jlog_{2(|C|-1)}\Big(P_{j,k}^j\Big)+P_{k,j}^jlog_{2(|C|-1)}\Big(P_{k,j}^j\Big)\Bigg)$$

cm.CEN

{'L1': 0.25, 'L2': 0.49657842846620864, 'L3': 0.6044162769630221}

Notice : $ |C| $ is the number of classes

Notice : new in version 1.3

MCEN (Modified confusion entropy)¶

Modified version of CEN [19].

$$P_{i,j}^{j}=\frac{Matrix(i,j)}{\sum_{k=1}^{|C|}\Big(Matrix(j,k)+Matrix(k,j)\Big)-Matrix(j,j)}$$

$$P_{i,j}^{i}=\frac{Matrix(i,j)}{\sum_{k=1}^{|C|}\Big(Matrix(i,k)+Matrix(k,i)\Big)-Matrix(i,i)}$$

$$MCEN_j=-\sum_{k=1,k\neq j}^{|C|}\Bigg(P_{j,k}^jlog_{2(|C|-1)}\Big(P_{j,k}^j\Big)+P_{k,j}^jlog_{2(|C|-1)}\Big(P_{k,j}^j\Big)\Bigg)$$

cm.MCEN

{'L1': 0.2643856189774724, 'L2': 0.5, 'L3': 0.6875}

Notice : new in version 1.3

AUC (Area under the ROC curve)¶

The area under the curve (often referred to as simply the AUC) is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative'). Thus, AUC corresponds to the arithmetic mean of sensitivity and specificity values of each class [23].

Interpretation

$$AUC=\frac{TNR+TPR}{2}$$

cm.AUC

{'L1': 0.8, 'L2': 0.65, 'L3': 0.5857142857142856}

Notice : new in version 1.4
Notice : this is an approximate calculation of AUC

dInd (Distance index)¶

Euclidean distance of a ROC point from the top left corner of the ROC space, which can take values between 0 (perfect classification) and $ \sqrt{2} $ [23].

$$dInd=\sqrt{(1-TNR)^2+(1-TPR)^2}$$

cm.dInd

{'L1': 0.4, 'L2': 0.5385164807134504, 'L3': 0.5862367008195198}

Notice : new in version 1.4

sInd (Similarity index)¶

sInd is comprised between $ 0 $ (no correct classifications) and $ 1 $ (perfect classification) [23].

$$sInd = 1 - \sqrt{\frac{(1-TNR)^2+(1-TPR)^2}{2}}$$

cm.sInd

{'L1': 0.717157287525381, 'L2': 0.6192113447068046, 'L3': 0.5854680534700882}

Notice : new in version 1.4

DP (Discriminant power)¶

Discriminant power (DP) is a measure that summarizes sensitivity and specificity. The DP has been used mainly in feature selection over imbalanced data [33].

Interpretation

$$X=\frac{TPR}{1-TPR}$$

$$Y=\frac{TNR}{1-TNR}$$

$$DP=\frac{\sqrt{3}}{\pi}(log_{10}X+log_{10}Y)$$

cm.DP

{'L1': 'None', 'L2': 0.33193306999649924, 'L3': 0.1659665349982495}

Notice : new in version 1.5

Y (Youden index)¶

Youden’s index evaluates the algorithm’s ability to avoid failure; it’s derived from sensitivity and specificity and denotes a linear correspondence balanced accuracy. As Youden’s index is a linear transformation of the mean sensitivity and specificity, its values are difficult to interpret, we retain that a higher value of Y indicates better ability to avoid failure. Youden’s index has been conventionally used to evaluate tests diagnostic, improve the efficiency of Telemedical prevention [33] [34].

Wikipedia page

Equals to Bookmaker Informedness

$$Y=BM=TPR+TNR-1$$

cm.Y

{'L1': 0.6000000000000001,
 'L2': 0.30000000000000004,
 'L3': 0.17142857142857126}

Notice : new in version 1.5

PLRI (Positive likelihood ratio interpretation)¶

For more information visit [33].

PLR	Model contribution
1 >	Negligible
1 - 5	Poor
5 - 10	Fair
> 10	Good

cm.PLRI

{'L1': 'None', 'L2': 'Poor', 'L3': 'Poor'}

Notice : new in version 1.5

NLRI (Negative likelihood ratio interpretation)¶

For more information visit [48].

NLR	Model contribution
0.5 - 1	Negligible
0.2 - 0.5	Poor
0.1 - 0.2	Fair
0.1 >	Good

cm.NLRI

{'L1': 'Poor', 'L2': 'Negligible', 'L3': 'Negligible'}

Notice : new in version 2.2

DPI (Discriminant power interpretation)¶

For more information visit [33].

DP	Model contribution
1 >	Poor
1 - 2	Limited
2 - 3	Fair
> 3	Good

cm.DPI

{'L1': 'None', 'L2': 'Poor', 'L3': 'Poor'}

Notice : new in version 1.5

AUCI (AUC value interpretation)¶

For more information visit [33].

AUC	Model performance
0.5 - 0.6	Poor
0.6 - 0.7	Fair
0.7 - 0.8	Good
0.8 - 0.9	Very Good
0.9 - 1.0	Excellent

cm.AUCI

{'L1': 'Very Good', 'L2': 'Fair', 'L3': 'Poor'}

Notice : new in version 1.6

MCCI (Matthews correlation coefficient interpretation)¶

MCC is a confusion matrix method of calculating the Pearson product-moment correlation coefficient (not to be confused with Pearson's C). Therefore, it has the same interpretation [2].

For more information visit [49].

MCC	Interpretation
0.3 >	Negligible
0.3 - 0.5	Weak
0.5 - 0.7	Moderate
0.7 - 0.9	Strong
0.9 - 1.0	Very Strong

cm.MCCI

{'L1': 'Moderate', 'L2': 'Negligible', 'L3': 'Negligible'}

Notice : new in version 2.2

Notice : only positive values are considered

QI (Yule's Q interpretation)¶

For more information visit [67].

Q	Interpretation
0.25 >	Negligible
0.25 - 0.5	Weak
0.5 - 0.75	Moderate
> 0.75	Strong

cm.QI

{'L1': 'None', 'L2': 'Moderate', 'L3': 'Weak'}

Notice : new in version 2.6

GI (Gini index)¶

A chance-standardized variant of the AUC is given by Gini coefficient, taking values between $ 0 $ (no difference between the score distributions of the two classes) and $ 1 $ (complete separation between the two distributions). Gini coefficient is widespread use metric in imbalanced data learning [33].

Wikipedia page

$$GI=2\times AUC-1$$

cm.GI

{'L1': 0.6000000000000001,
 'L2': 0.30000000000000004,
 'L3': 0.17142857142857126}

Notice : new in version 1.7

LS (Lift score)¶

In the context of classification, lift compares model predictions to randomly generated predictions. Lift is often used in marketing research combined with gain and lift charts as a visual aid [35] [36].

$$LS=\frac{PPV}{PRE}$$

cm.LS

{'L1': 2.4, 'L2': 2.0, 'L3': 1.2}

Notice : new in version 1.8

AM (Automatic/Manual)¶

Difference between automatic and manual classification i.e., the difference between positive outcomes and of positive samples.

$$AM=TOP-P=(TP+FP)-(TP+FN)$$

cm.AM

{'L1': -2, 'L2': 1, 'L3': 1}

Notice : new in version 1.9

BCD (Bray-Curtis dissimilarity)¶

In ecology and biology, the Bray–Curtis dissimilarity, named after J. Roger Bray and John T. Curtis, is a statistic used to quantify the compositional dissimilarity between two different sites, based on counts at each site [37].

Wikipedia page

$$BCD=\frac{|AM|}{\sum_{i=1}^{|C|}\Big(TOP_i+P_i\Big)}=\frac{|AM|}{2\times POP}$$

cm.BCD

{'L1': 0.08333333333333333,
 'L2': 0.041666666666666664,
 'L3': 0.041666666666666664}

Notice : new in version 1.9

OP (Optimized precision)¶

Optimized precision is a type of hybrid threshold metric and has been proposed as a discriminator for building an optimized heuristic classifier. This metric is a combination of accuracy, sensitivity and specificity metrics. The sensitivity and specificity metrics were used for stabilizing and optimizing the accuracy performance when dealing with an imbalanced class of two-class problems [40] [42].

$$OP = ACC - \frac{|TNR-TPR|}{|TNR+TPR|}$$

cm.OP

{'L1': 0.5833333333333334, 'L2': 0.5192307692307692, 'L3': 0.5589430894308943}

Notice : new in version 2.0

IBA (Index of balanced accuracy)¶

The method combines an unbiased index of its overall accuracy and a measure about how dominant is the class with the highest individual accuracy rate [41] [42].

$$IBA_{\alpha}=(1+\alpha \times(TPR-TNR))\times TNR \times TPR$$

cm.IBA

{'L1': 0.36, 'L2': 0.27999999999999997, 'L3': 0.35265306122448975}

cm.IBA_alpha(0.5)

{'L1': 0.48, 'L2': 0.34, 'L3': 0.3477551020408163}

cm.IBA_alpha(0.1)

{'L1': 0.576, 'L2': 0.388, 'L3': 0.34383673469387754}

Parameters¶

alpha : alpha parameter (type : float)

Output¶

{class1: IBA1, class2: IBA2, ...}

Notice : new in version 2.0

GM (G-mean)¶

Geometric mean of specificity and sensitivity [3] [41] [42].

$$GM=\sqrt{TPR \times TNR}$$

cm.GM

{'L1': 0.7745966692414834, 'L2': 0.6324555320336759, 'L3': 0.5855400437691198}

Notice : new in version 2.0

Q (Yule's Q)¶

In statistics, Yule's Q, also known as the coefficient of colligation, is a measure of association between two binary variables [45].

Interpretation

Wikipedia page

$$OR = \frac{TP\times TN}{FP\times FN}$$

$$Q = \frac{OR-1}{OR+1}$$

cm.Q

{'L1': 'None', 'L2': 0.6, 'L3': 0.3333333333333333}

Notice : new in version 2.1

AGM (Adjusted G-mean)¶

An adjusted version of the geometric mean of specificity and sensitivity [46].

$$N_n=\frac{N}{POP}$$

$$AGM=\frac{GM+TNR\times N_n}{1+N_n};TPR>0$$

$$AGM=0;TPR=0$$

cm.AGM

{'L1': 0.8576400016262, 'L2': 0.708612108382005, 'L3': 0.5803410802752335}

Notice : new in version 2.1

AGF (Adjusted F-score)¶

The F-measures used only three of the four elements of the confusion matrix and hence two classifiers with different TNR values may have the same F-score. Therefore, the AGF metric is introduced to use all elements of the confusion matrix and provide more weights to samples which are correctly classified in the minority class [50].

$$AGF=\sqrt{F_2 \times InvF_{0.5}}$$

$$F_{2}=5\times \frac{PPV\times TPR}{(4 \times PPV)+TPR}$$

$$InvF_{0.5}=(1+0.5^2)\times \frac{NPV\times TNR}{(0.5^2 \times NPV)+TNR}$$

cm.AGF

{'L1': 0.7285871475307653, 'L2': 0.6286946134619315, 'L3': 0.610088876086563}

Notice : new in version 2.3

OC (Overlap coefficient)¶

The overlap coefficient, or Szymkiewicz–Simpson coefficient, is a similarity measure that measures the overlap between two finite sets. It is defined as the size of the intersection divided by the smaller of the size of the two sets [52].

Wikipedia page

$$OC=\frac{TP}{min(TOP,P)}=max(PPV,TPR)$$

cm.OC

{'L1': 1.0, 'L2': 0.5, 'L3': 0.6}

Notice : new in version 2.3

BB (Braun-Blanquet similarity)¶

The Braun-Blanquet coefficient is a similarity measure that is mostly used in botany. It is defined as the size of the intersection divided by the larger of the size of the two sets [82] [83].

$$BB=\frac{TP}{max(TOP,P)}=min(PPV,TPR)$$

cm.BB

{'L1': 0.6, 'L2': 0.3333333333333333, 'L3': 0.5}

Notice : new in version 3.6

OOC (Otsuka-Ochiai coefficient)¶

In biology, there is a similarity index, known as the Otsuka-Ochiai coefficient named after Yanosuke Otsuka and Akira Ochiai, also known as the Ochiai-Barkman or Ochiai coefficient. If sets are represented as bit vectors, the Otsuka-Ochiai coefficient can be seen to be the same as the cosine similarity [53].

Wikipedia page

$$OOC=\frac{TP}{\sqrt{TOP\times P}}$$

cm.OOC

{'L1': 0.7745966692414834, 'L2': 0.4082482904638631, 'L3': 0.5477225575051661}

Notice : new in version 2.3

TI (Tversky index)¶

The Tversky index, named after Amos Tversky, is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of Dice's coefficient and Tanimoto coefficient [54].

Wikipedia page

$$TI(\alpha,\beta)=\frac{TP}{TP+\alpha FN+\beta FP}$$

cm.TI(2,3)

{'L1': 0.42857142857142855, 'L2': 0.1111111111111111, 'L3': 0.1875}

Parameters¶

alpha : alpha coefficient (type : float)
beta : beta coefficient (type : float)

Output¶

{class1: TI1, class2: TI2, ...}

Notice : new in version 2.4

AUPR (Area under the PR curve)¶

A PR curve is plotting precision against recall. The precision recall area under curve (AUPR) is just the area under the PR curve. The higher it is, the better the model is [55] [56].

$$AUPR=\frac{TPR+PPV}{2}$$

cm.AUPR

{'L1': 0.8, 'L2': 0.41666666666666663, 'L3': 0.55}

Notice : new in version 2.4
Notice : this is an approximate calculation of AUPR

ICSI (Individual classification success index)¶

The Individual Classification Success Index (ICSI), is a class-specific symmetric measure defined for classification assessment purpose. ICSI is hence $ 1 $ minus the sum of type I and type II errors. It ranges from $ -1 $ (both errors are maximal, i.e. $ 1 $) to $ 1 $ (both errors are minimal, i.e. $ 0 $), but the value $ 0 $ does not have any clear meaning. The measure is symmetric, and linearly related to the arithmetic mean of TPR and PPV [58].

$$ICSI=PPV+TPR-1$$

cm.ICSI

{'L1': 0.6000000000000001,
 'L2': -0.16666666666666674,
 'L3': 0.10000000000000009}

Notice : new in version 2.5

CI (Confidence interval)¶

In statistics, a confidence interval (CI) is a type of interval estimate (of a population parameter) that is computed from the observed data. The confidence level is the frequency (i.e., the proportion) of possible confidence intervals that contain the true value of their corresponding parameter. In other words, if confidence intervals are constructed using a given confidence level in an infinite number of independent experiments, the proportion of those intervals that contain the true value of the parameter will match the confidence level [31].

Supported statistics : ACC,AUC,PRE,Overall ACC,Kappa,TPR,TNR,PPV,NPV,PLR,NLR

Supported alpha values (two-sided) : 0.001, 0.002, 0.01, 0.02, 0.05, 0.1, 0.2

Supported alpha values (one-sided) : 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1

Confidence intervals for TPR,TNR,PPV,NPV,ACC,PRE and Overall ACC are calculated using the normal approximation to the binomial distribution [59], Wilson score [62] and Agresti-Coull method [63]:

Normal approximation¶

$$SE=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$$$CI=\hat{p}\pm z\times SE$$$$n=\begin{cases}P & \hat{p} == TPR/FNR\\N & \hat{p} == TNR/FPR\\TOP & \hat{p} == PPV\\TON & \hat{p} ==NPV \\POP& \hat{p} == ACC/ACC_{Overall}\end{cases}$$

Wilson score¶

$$CI=\frac{\hat{p}+\frac{z^2}{2n}}{1+\frac{z^2}{n}}\pm\frac{z}{1+\frac{z^2}{n}}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}+\frac{z^2}{4n^2}}$$

Agresti-Coull¶

$$\hat{p}=\frac{x}{n}$$$$\tilde{p}=\frac{x+\frac{z^2}{2}}{n+z^2}$$$$CI =\tilde{p}\pm\sqrt{\frac{\tilde{p}(1-\tilde{p})}{n+z^2}}$$

Confidence interval for Kappa are calculated using Fleiss formula [24] [38] :

$$SE_{Kappa}=\sqrt{\frac{ACC_{Overall}\times (1-RACC_{Overall})}{(1-RACC_{Overall})^2}}$$$$CI_{Kappa}=Kappa\pm z\times SE_{Kappa}$$

Confidence intervals for NLR and PLR are calculated using the log method [60] :

$$SE_{LR}=\sqrt{\frac{1}{a}-\frac{1}{b}+\frac{1}{c}-\frac{1}{d}}$$$$CI_{LR}=e^{ln(LR)\pm z\times SE_{LR}}$$$$PLR:\begin{cases}a=TP\\b=P\\c=FP\\d=N\end{cases}$$$$NLR:\begin{cases}a=FN\\b=P\\c=TN\\d=N\end{cases}$$

Confidence interval for AUC is calculated using Hanley and McNeil formula [61] :

$$SE_{AUC}=\sqrt{\frac{q_0+(N-1)q_1+(P-1)q_2}{N\times P}}$$$$q_0=AUC(1-AUC)$$$$q_1=\frac{AUC}{2-AUC}-AUC^2$$$$q_2=\frac{2AUC^2}{1+AUC}-AUC^2$$$$CI_{AUC}=AUC\pm z\times SE_{AUC}$$

cm.CI("TPR")

{'L1': [0.21908902300206645, (0.17058551491594975, 1.0294144850840503)],
 'L2': [0.3535533905932738, (-0.19296464556281656, 1.1929646455628165)],
 'L3': [0.21908902300206645, (0.17058551491594975, 1.0294144850840503)]}

cm.CI("FNR", alpha=0.001, one_sided=True)

{'L1': [0.21908902300206645, (-0.2769850810763853, 1.0769850810763852)],
 'L2': [0.3535533905932738, (-0.5924799769332159, 1.5924799769332159)],
 'L3': [0.21908902300206645, (-0.2769850810763853, 1.0769850810763852)]}

cm.CI("PRE", alpha=0.05, binom_method="wilson")

{'L1': [0.14231876063832774, (0.19325746190524654, 0.6804926643446272)],
 'L2': [0.10758287072798381, (0.04696414761482223, 0.44803635738467273)],
 'L3': [0.14231876063832774, (0.19325746190524654, 0.6804926643446272)]}

cm.CI("Overall ACC", alpha=0.02, binom_method="agresti-coull")

[0.14231876063832777, (0.2805568916340536, 0.8343177950165198)]

cm.CI("Overall ACC", alpha=0.05)

[0.14231876063832777, (0.30438856248221097, 0.8622781041844558)]

Parameters¶

param : input parameter (type : str)
alpha : type I error (type : float, default : 0.05)
one_sided : one-sided mode flag (type : bool, default : False)
binom_method : binomial confidence intervals method (type : str, default : normal-approx)

Output¶

Two-sided : {class1: [SE1, (Lower CI, Upper CI)], ...}
One-sided : {class1: [SE1, (Lower one-sided CI, Upper one-sided CI)], ...}

Notice : For more information visit Example 8

Notice : new in version 2.5

NB (Net benefit)¶

NB is a weighted sum of true positive classifications with compensation for false positive classifications by giving these a weight $ w $ [64] [65].

$$NB=\frac{TP-w\times FP}{POP}$$

Vickers and Elkin (2006) suggested considering a range of thresholds and calculating the NB across these thresholds. The results can be plotted in a decision curve [66].

$$p_t=threshold$$$$w=\frac{p_t}{1-p_t}$$

cm.NB(w=0.059)

{'L1': 0.25, 'L2': 0.0735, 'L3': 0.23525}

Parameters¶

w : weight

Output¶

{class1: NB1, class2: NB2, ...}

Notice : new in version 2.6

Average¶

Here "average" refers to the arithmetic mean, the sum of the numbers divided by how many numbers are being averaged.

Wikipedia page

cm.average("PPV")

0.6111111111111112

cm.average("F1")

0.5651515151515151

cm.average("DOR", none_omit=True)

3.0000000000000004

Parameters¶

param : input parameter (type : str)
none_omit : none items omitting flag (type : bool, default : False)

Output¶

Average

Notice : new in version 2.7

Weighted average¶

The weighted average is similar to an ordinary average, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others.

Default weight is condition positive (number of positive samples).

Wikipedia page

cm.weighted_average("PPV")

0.6805555555555557

cm.weighted_average("F1")

0.606439393939394

cm.weighted_average("DOR", none_omit=True)

2.5714285714285716

cm.weighted_average("F1", weight={"L1": 23, "L2": 2, "L3": 1})

0.7152097902097903

Parameters¶

param : input parameter (type : str)
weight : explicitly passes weights (type : dict, default : None)
none_omit : none items omitting flag (type : bool, default : False)

Output¶

Weighted average

Notice : new in version 2.7

Sensitivity index¶

The sensitivity index or d′ is a statistic used in signal detection theory. It provides the separation between the means of the signal and the noise distributions, compared against the standard deviation of the signal or noise distribution. d′ can be estimated from the observed hit rate and false-alarm rate, as follows [76]:

$$d^{\prime}=Z(TPR) - Z(FPR)$$

Function Z(p), p ∈ [0,1], is the inverse of the cumulative distribution function of the Gaussian distribution.

Wikipedia page

cm.sensitivity_index()

{'L1': 'None', 'L2': 0.8416212335729143, 'L3': 0.4333594729285047}

Output¶

{class1: SI1, class2: SI2, ...}

Notice : new in version 3.1

HD (Hamming distance)¶

In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other. In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. It is named after the American mathematician Richard Hamming [80] [81].

A major application is in coding theory, more specifically to block codes, in which the equal-length strings are vectors over a finite field.

Wikipedia page

$$HD = FN + FP$$

cm.HD

{'L1': 2, 'L2': 3, 'L3': 5}

Notice : new in version 3.6

PR (Positive rate)¶

Proportion of actual instances belonging to the class in the dataset.

Equals to Prevalence)

$$PR = \frac{P}{POP}$$

cm.PR

{'L1': 0.4166666666666667, 'L2': 0.16666666666666666, 'L3': 0.4166666666666667}

Notice : new in version 4.4

TOPR (Test outcome positive rate)¶

Proportion of predicted instances assigned to the class by the model.

$$TOPR = \frac{TOP}{POP}$$

cm.TOPR

{'L1': 0.25, 'L2': 0.25, 'L3': 0.5}

Notice : new in version 4.4

Overall statistics¶

Kappa¶

Kappa is a statistic that measures inter-rater agreement for qualitative (categorical) items. It is generally thought to be a more robust measure than simple percent agreement calculation, as kappa takes into account the possibility of the agreement occurring by chance [24].

Benchmark1 Benchmark2 Benchmark3 Benchmark4

Wikipedia page

$$Kappa=\frac{ACC_{Overall}-RACC_{Overall}}{1-RACC_{Overall}}$$

cm.Kappa

0.35483870967741943

Notice : new in version 0.3

Kappa unbiased¶

The unbiased kappa value is defined in terms of total accuracy and a slightly different computation of expected likelihood that averages the reference and response probabilities [25].

Equals to Scott's Pi

$$Kappa_{Unbiased}=\frac{ACC_{Overall}-RACCU_{Overall}}{1-RACCU_{Overall}}$$

cm.KappaUnbiased

0.34426229508196726

Notice : new in version 0.8.1

Kappa no prevalence¶

Prevalence-adjusted and bias-adjusted kappa (PABAK) [14].

$$Kappa_{NoPrevalence}=2 \times ACC_{Overall}-1$$

cm.KappaNoPrevalence

0.16666666666666674

Notice : new in version 0.8.1

Weighted kappa¶

The weighted kappa allows disagreements to be weighted differently and is especially useful when codes are ordered. Three matrices are involved, the matrix of observed scores, the matrix of expected scores based on chance agreement, and the weight matrix [70] [71].

$$v_{ij}=1-\frac{w_{ij}}{max(w)}$$

$$P_e=\sum_{i,j=1}^{|C|}\frac{TOP_i \times P_j}{POP^2}\times v_{ij}$$

$$P_a=\sum_{i,j=1}^{|C|}\frac{Matrix(i,j)}{POP}\times v_{ij}$$

$$Kappa_{Weighted}=\frac{P_a-P_e}{1-P_e}$$

cm.weighted_kappa(
    weight={
        "L1": {"L1": 0, "L2": 1, "L3": 2},
        "L2": {"L1": 1, "L2": 0, "L3": 1},
        "L3": {"L1": 2, "L2": 1, "L3": 0}})

0.39130434782608675

cm.weighted_kappa()

C:\Users\Sepkjaer\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pycm-4.4-py3.5.egg\pycm\cm.py:893: RuntimeWarning: Invalid weight format; the result is for unweighted kappa.

0.35483870967741943

Parameters¶

weight : weight matrix (type : dict, default : None)

Output¶

Weighted kappa

Notice : new in version 2.7

Kappa standard error¶

The standard error(s) of the Kappa coefficient was obtained by Fleiss (1969) [24] [38].

$$SE_{Kappa}=\sqrt{\frac{ACC_{Overall}\times (1-RACC_{Overall})}{(1-RACC_{Overall})^2}}$$

cm.Kappa_SE

0.2203645326012817

Notice : new in version 0.7

Kappa 95% CI¶

Kappa 95% Confidence Interval [24] [38].

$$CI_{Kappa}=Kappa \pm 1.96\times SE_{Kappa}$$

cm.Kappa_CI

(-0.07707577422109269, 0.7867531935759315)

Notice : new in version 0.7

Chi-squared¶

Pearson's chi-squared test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is suitable for unpaired data from large samples [10].

Wikipedia page

$$\chi^2=\sum_{i=1}^{|C|}\sum_{j=1}^{|C|}\frac{\Big(Matrix(i,j)-E(i,j)\Big)^2}{E(i,j)}$$

$$E(i,j)=\frac{TOP_j\times P_i}{POP}$$

cm.Chi_Squared

6.6000000000000005

Notice : new in version 0.7

Chi-squared DF¶

Number of degrees of freedom of this confusion matrix for the chi-squared statistic [10].

$$DF=(|C|-1)^2$$

cm.DF

4

Notice : new in version 0.7

Phi-squared¶

In statistics, the phi coefficient (or mean square contingency coefficient) is a measure of association for two binary variables. Introduced by Karl Pearson, this measure is similar to the Pearson correlation coefficient in its interpretation. In fact, a Pearson correlation coefficient estimated for two binary variables will return the phi coefficient [10].

Wikipedia page

$$\phi^2=\frac{\chi^2}{POP}$$

cm.Phi_Squared

0.55

Notice : new in version 0.7

Cramer's V¶

In statistics, Cramér's V (sometimes referred to as Cramér's phi) is a measure of association between two nominal variables, giving a value between $ 0 $ and $ +1 $ (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramér in 1946 [26].

Benchmark

Wikipedia page

$$V=\sqrt{\frac{\phi^2}{|C|-1}}$$

cm.V

0.5244044240850758

Notice : new in version 0.7

Standard error¶

The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation [31].

Wikipedia page

$$SE_{ACC}=\sqrt{\frac{ACC\times (1-ACC)}{POP}}$$

cm.SE

0.14231876063832777

Notice : new in version 0.7

95% CI¶

In statistics, a confidence interval (CI) is a type of interval estimate (of a population parameter) that is computed from the observed data. The confidence level is the frequency (i.e., the proportion) of possible confidence intervals that contain the true value of their corresponding parameter. In other words, if confidence intervals are constructed using a given confidence level in an infinite number of independent experiments, the proportion of those intervals that contain the true value of the parameter will match the confidence level [31].

Wikipedia page

$$CI=ACC \pm 1.96\times SE_{ACC}$$

cm.CI95

(0.30438856248221097, 0.8622781041844558)

Notice : new in version 0.7

Notice : CI renamed to CI95 in version 2.5

Bennett's S¶

Bennett, Alpert & Goldstein’s S is a statistical measure of inter-rater agreement. It was created by Bennett et al. in 1954. Bennett et al. suggested adjusting inter-rater reliability to accommodate the percentage of rater agreement that might be expected by chance was a better measure than a simple agreement between raters [8].

Wikipedia Page

$$p_c=\frac{1}{|C|}$$

$$S=\frac{ACC_{Overall}-p_c}{1-p_c}$$

cm.S

0.37500000000000006

Notice : new in version 0.5

Scott's Pi¶

Scott's pi (named after William A. Scott) is a statistic for measuring inter-rater reliability for nominal data in communication studies. Textual entities are annotated with categories by different annotators, and various measures are used to assess the extent of agreement between the annotators, one of which is Scott's pi. Since automatically annotating text is a popular problem in natural language processing, and the goal is to get the computer program that is being developed to agree with the humans in the annotations it creates, assessing the extent to which humans agree with each other is important for establishing a reasonable upper limit on computer performance [7].

Wikipedia page

Equals to Kappa Unbiased

$$p_c=\sum_{i=1}^{|C|}(\frac{TOP_i + P_i}{2\times POP})^2$$

$$\pi=\frac{ACC_{Overall}-p_c}{1-p_c}$$

cm.PI

0.34426229508196726

Notice : new in version 0.5

Gwet's AC1¶

AC1 was originally introduced by Gwet in 2001 (Gwet, 2001). The interpretation of AC1 is similar to generalized kappa (Fleiss, 1971), which is used to assess inter-rater reliability when there are multiple raters. Gwet (2002) demonstrated that AC1 can overcome the limitations that kappa is sensitive to trait prevalence and rater's classification probabilities (i.e., marginal probabilities), whereas AC1 provides more robust measure of inter-rater reliability [6].

$$\pi_i=\frac{TOP_i + P_i}{2\times POP}$$

$$p_c=\frac{1}{|C|-1}\sum_{i=1}^{|C|}\Big(\pi_i\times (1-\pi_i)\Big)$$

$$AC_1=\frac{ACC_{Overall}-p_c}{1-p_c}$$

cm.AC1

0.3893129770992367

Notice : new in version 0.5

Reference entropy¶

The entropy of the decision problem itself as defined by the counts for the reference. The entropy of a distribution is the average negative log probability of outcomes [30].

$$Likelihood_{Reference}=\frac{P_i}{POP}$$

$$Entropy_{Reference}=-\sum_{i=1}^{|C|}Likelihood_{Reference}(i)\times\log_{2}{Likelihood_{Reference}(i)}$$

$$0\times\log_{2}{0}\equiv0$$

cm.ReferenceEntropy

1.4833557549816874

Notice : new in version 0.8.1

Response entropy¶

The entropy of the response distribution. The entropy of a distribution is the average negative log probability of outcomes [30].

$$Likelihood_{Response}=\frac{TOP_i}{POP}$$

$$Entropy_{Response}=-\sum_{i=1}^{|C|}Likelihood_{Response}(i)\times\log_{2}{Likelihood_{Response}(i)}$$

$$0\times\log_{2}{0}\equiv0$$

cm.ResponseEntropy

1.5

Notice : new in version 0.8.1

Cross entropy¶

The cross-entropy of the response distribution against the reference distribution. The cross-entropy is defined by the negative log probabilities of the response distribution weighted by the reference distribution [30].

Wikipedia page

$$Likelihood_{Reference}=\frac{P_i}{POP}$$

$$Likelihood_{Response}=\frac{TOP_i}{POP}$$

$$Entropy_{Cross}=-\sum_{i=1}^{|C|}Likelihood_{Reference}(i)\times\log_{2}{Likelihood_{Response}(i)}$$

$$0\times\log_{2}{0}\equiv0$$

cm.CrossEntropy

1.5833333333333335

Notice : new in version 0.8.1

Joint entropy¶

The entropy of the joint reference and response distribution as defined by the underlying matrix [30].

$$P^{'}(i,j)=\frac{Matrix(i,j)}{POP}$$

$$Entropy_{Joint}=-\sum_{i=1}^{|C|}\sum_{j=1}^{|C|}P^{'}(i,j)\times\log_{2}{P^{'}(i,j)}$$

$$0\times\log_{2}{0}\equiv0$$

cm.JointEntropy

2.4591479170272446

Notice : new in version 0.8.1

Conditional entropy¶

The entropy of the distribution of categories in the response given that the reference category was as specified [30].

Wikipedia page

$$P^{'}(j|i)=\frac{Matrix(j,i)}{P_i}$$

$$Entropy_{Conditional}=\sum_{i=1}^{|C|}\Bigg(Likelihood_{Reference}(i)\times\Big(-\sum_{j=1}^{|C|}P^{'}(j|i)\times\log_{2}{P^{'}(j|i)}\Big)\Bigg)$$

$$0\times\log_{2}{0}\equiv0$$

cm.ConditionalEntropy

0.9757921620455572

Notice : new in version 0.8.1

Kullback-Leibler divergence¶

In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy) is a measure of how one probability distribution diverges from a second, expected probability distribution [11] [30].

Wikipedia Page

$$Likelihood_{Response}=\frac{TOP_i}{POP}$$

$$Likelihood_{Reference}=\frac{P_i}{POP}$$

$$Divergence=-\sum_{i=1}^{|C|}Likelihood_{Reference}\times\log_{2}{\frac{Likelihood_{Reference}}{Likelihood_{Response}}}$$

cm.KL

0.09997757835164581

Notice : new in version 0.8.1

Mutual information¶

Mutual information is defined as Kullback-Leibler divergence, between the product of the individual distributions and the joint distribution. Mutual information is symmetric. We could also subtract the conditional entropy of the reference given the response from the reference entropy to get the same result [11] [30].

Wikipedia Page

$$P^{'}(i,j)=\frac{Matrix(i,j)}{POP}$$

$$Likelihood_{Reference}=\frac{P_i}{POP}$$

$$Likelihood_{Response}=\frac{TOP_i}{POP}$$

$$MI=-\sum_{i=1}^{|C|}\sum_{j=1}^{|C|}P^{'}(i,j)\times\log_{2}\Big({\frac{P^{'}(i,j)}{Likelihood_{Reference}(i)\times Likelihood_{Response}(i) }\Big)}$$

$$MI=Entropy_{Response}-Entropy_{Conditional}$$

cm.MutualInformation

0.5242078379544428

Notice : new in version 0.8.1

Goodman & Kruskal's lambda A¶

In probability theory and statistics, Goodman & Kruskal's lambda is a measure of proportional reduction in error in cross tabulation analysis [12].

Wikipedia page

$$\lambda_A=\frac{\sum_{j=1}^{|C|}Max\Big(Matrix(-,j)\Big)-Max(P)}{POP-Max(P)}$$

cm.LambdaA

0.42857142857142855

Notice : new in version 0.8.1

Goodman & Kruskal's lambda B¶

In probability theory and statistics, Goodman & Kruskal's lambda is a measure of proportional reduction in error in cross tabulation analysis [13].

Wikipedia Page

$$\lambda_B=\frac{\sum_{i=1}^{|C|}Max\Big(Matrix(i,-)\Big)-Max(TOP)}{POP-Max(TOP)}$$

cm.LambdaB

0.16666666666666666

Notice : new in version 0.8.1

SOA1 (Landis & Koch's benchmark)¶

For more information visit [1].

Kappa	Strength of Agreement
0 >	Poor
0 - 0.2	Slight
0.2 – 0.4	Fair
0.4 – 0.6	Moderate
0.6 – 0.8	Substantial
0.8 – 1.0	Almost perfect

cm.SOA1

'Fair'

Notice : new in version 0.3

SOA2 (Fleiss' benchmark)¶

For more information visit [4].

Kappa	Strength of Agreement
0.40 >	Poor
0.40 - 0.75	Intermediate to Good
More than 0.75	Excellent

cm.SOA2

'Poor'

Notice : new in version 0.4

SOA3 (Altman's benchmark)¶

For more information visit [5].

Kappa	Strength of Agreement
0.2 >	Poor
0.2 – 0.4	Fair
0.4 – 0.6	Moderate
0.6 – 0.8	Good
0.8 – 1.0	Very Good

cm.SOA3

'Fair'

Notice : new in version 0.4

SOA4 (Cicchetti's benchmark)¶

For more information visit [9].

Kappa	Strength of Agreement
0.40 >	Poor
0.40 – 0.59	Fair
0.59 – 0.74	Good
0.74 – 1.00	Excellent

cm.SOA4

'Poor'

Notice : new in version 0.7

SOA5 (Cramer's benchmark)¶

For more information visit [47].

Cramer's V	Strength of Association
0.1 >	Negligible
0.1 – 0.2	Weak
0.2 – 0.4	Moderate
0.4 – 0.6	Relatively Strong
0.6 – 0.8	Strong
0.8 – 1.0	Very Strong

cm.SOA5

'Relatively Strong'

Notice : new in version 2.2

SOA6 (Matthews's benchmark)¶

MCC is a confusion matrix method of calculating the Pearson product-moment correlation coefficient (not to be confused with Pearson's C). Therefore, it has the same interpretation [2].

For more information visit [49].

Overall MCC	Strength of Association
0.3 >	Negligible
0.3 - 0.5	Weak
0.5 - 0.7	Moderate
0.7 - 0.9	Strong
0.9 - 1.0	Very Strong

cm.SOA6

'Weak'

Notice : new in version 2.2

Notice : only positive values are considered

SOA7 (Goodman & Kruskal's lambda A benchmark)¶

For more information visit [84].

Lambda A	Strength of Association
0 - 0.2	Very Weak
0.2 - 0.4	Weak
0.4 - 0.6	Moderate
0.6 - 0.8	Strong
0.8 - 1.0	Very Strong
1.0	Perfect

cm.SOA7

'Moderate'

Notice : new in version 3.8

SOA8 (Goodman & Kruskal's lambda B benchmark)¶

For more information visit [84].

Lambda B	Strength of Association
0 - 0.2	Very Weak
0.2 - 0.4	Weak
0.4 - 0.6	Moderate
0.6 - 0.8	Strong
0.8 - 1.0	Very Strong
1.0	Perfect

cm.SOA8

'Very Weak'

Notice : new in version 3.8

SOA9 (Krippendorff's alpha benchmark)¶

For more information visit [85].

Alpha	Strength of Agreement
0.667 >	Low
0.667 - 0.8	Tentative
0.8 <	High

cm.SOA9

'Low'

Notice : new in version 3.8

SOA10 (Pearson's C benchmark)¶

For more information visit [86].

C	Strength of Association
0 - 0.1	Not Appreciable
0.1 - 0.2	Weak
0.2 - 0.3	Medium
0.3 <	Strong

cm.SOA10

'Strong'

Notice : new in version 3.8

Overall_ACC¶

For more information visit [3].

$$ACC_{Overall}=\frac{\sum_{i=1}^{|C|}TP_i}{POP}$$

Equals to TPR Micro, F1 Micro and PPV Micro

cm.Overall_ACC

0.5833333333333334

Notice : new in version 0.4

Overall_RACC¶

For more information visit [24].

$$RACC_{Overall}=\sum_{i=1}^{|C|}RACC_i$$

cm.Overall_RACC

0.3541666666666667

Notice : new in version 0.4

Overall_RACCU¶

For more information visit [25].

$$RACCU_{Overall}=\sum_{i=1}^{|C|}RACCU_i$$

cm.Overall_RACCU

0.3645833333333333

Notice : new in version 0.8.1

PPV_Micro¶

For more information visit [3].

$$PPV_{Micro}=\frac{\sum_{i=1}^{|C|}TP_i}{\sum_{i=1}^{|C|}TP_i+FP_i}$$

Equals to TPR Micro, F1 Micro and Overall ACC

cm.PPV_Micro

0.5833333333333334

Notice : new in version 0.4

NPV_Micro¶

For more information visit [3].

$$NPV_{Micro}=\frac{\sum_{i=1}^{|C|}TN_i}{\sum_{i=1}^{|C|}TN_i+FN_i}$$

cm.NPV_Micro

0.7916666666666666

Notice : new in version 3.9

TPR_Micro¶

For more information visit [3].

$$TPR_{Micro}=\frac{\sum_{i=1}^{|C|}TP_i}{\sum_{i=1}^{|C|}TP_i+FN_i}$$

Equals to PPV Micro, F1 Micro and Overall ACC

cm.TPR_Micro

0.5833333333333334

Notice : new in version 0.4

TNR_Micro¶

For more information visit [3].

$$TNR_{Micro}=\frac{\sum_{i=1}^{|C|}TN_i}{\sum_{i=1}^{|C|}TN_i+FP_i}$$

cm.TNR_Micro

0.7916666666666666

Notice : new in version 2.6

FPR_Micro¶

For more information visit [3].

$$FPR_{Micro}=\frac{\sum_{i=1}^{|C|}FP_i}{\sum_{i=1}^{|C|}TN_i+FP_i}$$

cm.FPR_Micro

0.20833333333333337

Notice : new in version 2.6

FNR_Micro¶

For more information visit [3].

$$FNR_{Micro}=\frac{\sum_{i=1}^{|C|}FN_i}{\sum_{i=1}^{|C|}TP_i+FN_i}$$

cm.FNR_Micro

0.41666666666666663

Notice : new in version 2.6

F1_Micro¶

For more information visit [3].

$$F_{1_{Micro}}=2\frac{\sum_{i=1}^{|C|}TPR_i\times PPV_i}{\sum_{i=1}^{|C|}TPR_i+PPV_i}$$

Equals to PPV Micro, TPR Micro and Overall ACC

cm.F1_Micro

0.5833333333333334

Notice : new in version 2.2

PPV_Macro¶

For more information visit [3].

$$PPV_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}\frac{TP_i}{TP_i+FP_i}$$

cm.PPV_Macro

0.611111111111111

Notice : new in version 0.4

NPV_Macro¶

For more information visit [3].

$$NPV_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}\frac{TN_i}{TN_i+FN_i}$$

cm.NPV_Macro

0.7777777777777777

Notice : new in version 3.9

TPR_Macro¶

For more information visit [3].

$$TPR_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}\frac{TP_i}{TP_i+FN_i}$$

cm.TPR_Macro

0.5666666666666668

Notice : new in version 0.4

TNR_Macro¶

For more information visit [3].

$$TNR_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}\frac{TN_i}{TN_i+FP_i}$$

cm.TNR_Macro

0.7904761904761904

Notice : new in version 2.6

FPR_Macro¶

For more information visit [3].

$$FPR_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}\frac{FP_i}{TN_i+FP_i}$$

cm.FPR_Macro

0.20952380952380956

Notice : new in version 2.6

FNR_Macro¶

For more information visit [3].

$$FNR_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}\frac{FN_i}{TP_i+FN_i}$$

cm.FNR_Macro

0.43333333333333324

Notice : new in version 2.6

F1_Macro¶

For more information visit [3].

$$F_{1_{Macro}}=\frac{2}{|C|}\sum_{i=1}^{|C|}\frac{TPR_i\times PPV_i}{TPR_i+PPV_i}$$

cm.F1_Macro

0.5651515151515151

Notice : new in version 2.2

ACC_Macro¶

For more information visit [3].

$$ACC_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}{ACC_i}$$

cm.ACC_Macro

0.7222222222222223

Notice : new in version 2.2

Overall_J¶

For more information visit [29].

$$J_{Mean}=\frac{1}{|C|}\sum_{i=1}^{|C|}J_i$$

$$J_{Sum}=\sum_{i=1}^{|C|}J_i$$

$$J_{Overall}=(J_{Sum},J_{Mean})$$

cm.Overall_J

(1.225, 0.4083333333333334)

Notice : new in version 0.9

Hamming loss¶

The average Hamming loss or Hamming distance between two sets of samples [31].

$$L_{Hamming}=\frac{1}{POP}\sum_{i=1}^{POP}1(y_i \neq \widehat{y}_i)$$

cm.HammingLoss

0.41666666666666663

Notice : new in version 1.0

Zero-one loss¶

Zero-one loss is a common loss function used with classification learning. It assigns $ 0 $ to loss for a correct classification and $ 1 $ for an incorrect classification [31].

$$L_{0-1}=\sum_{i=1}^{POP}1(y_i \neq \widehat{y}_i)$$

cm.ZeroOneLoss

5

Notice : new in version 1.1

NIR (No information rate)¶

Largest class percentage in the data [57].

$$NIR=\frac{1}{POP}Max(P)$$

cm.NIR

0.4166666666666667

Notice : new in version 1.2

P-Value¶

In statistical hypothesis testing, the p-value or probability value is, for a given statistical model, the probability that, when the null hypothesis is true, the statistical summary (such as the absolute value of the sample mean difference between two compared groups) would be greater than or equal to the actual observed results [31] .
Here a one-sided binomial test to see if the accuracy is better than the no information rate [57].

Wikipedia Page

$$x=\sum_{i=1}^{|C|}TP_{i}$$

$$p=NIR$$

$$n=POP$$

$$P-Value_{(ACC > NIR)}=1-\sum_{i=1}^{x}\left(\begin{array}{c}n\\ i\end{array}\right)p^{i}(1-p)^{n-i}$$

cm.PValue

0.18926430237560654

Notice : new in version 1.2

Overall_CEN¶

For more information visit [17].

$$P_j=\frac{\sum_{k=1}^{|C|}\Big(Matrix(j,k)+Matrix(k,j)\Big)}{2\sum_{k,l=1}^{|C|}Matrix(k,l)}$$

$$CEN_{Overall}=\sum_{j=1}^{|C|}P_jCEN_j$$

cm.Overall_CEN

0.4638112995385119

Notice : new in version 1.3

Overall_MCEN¶

For more information visit [19].

$$\alpha=\begin{cases}1 & |C| > 2\\0 & |C| = 2\end{cases}$$

$$P_j=\frac{\sum_{k=1}^{|C|}\Big(Matrix(j,k)+Matrix(k,j)\Big)-Matrix(j,j)}{2\sum_{k,l=1}^{|C|}Matrix(k,l)-\alpha \sum_{k=1}^{|C|}Matrix(k,k)}$$

$$MCEN_{Overall}=\sum_{j=1}^{|C|}P_jMCEN_j$$

cm.Overall_MCEN

0.5189369467580801

Notice : new in version 1.3

Overall_MCC¶

For more information visit [20] [27].

Benchmark

$$MCC_{Overall}=\frac{cov(X,Y)}{\sqrt{cov(X,X)\times cov(Y,Y)}}$$

$$cov(X,Y)=\sum_{i,j,k=1}^{|C|}\Big(Matrix(i,i)Matrix(k,j)-Matrix(j,i)Matrix(i,k)\Big)$$

$$cov(X,X) = \sum_{i=1}^{|C|}\Bigg[\Big(\sum_{j=1}^{|C|}Matrix(j,i)\Big)\Big(\sum_{k,l=1,k\neq i}^{|C|}Matrix(l,k)\Big)\Bigg]$$

$$cov(Y,Y) = \sum_{i=1}^{|C|}\Bigg[\Big(\sum_{j=1}^{|C|}Matrix(i,j)\Big)\Big(\sum_{k,l=1,k\neq i}^{|C|}Matrix(k,l)\Big)\Bigg]$$

cm.Overall_MCC

0.36666666666666664

Notice : new in version 1.4

RR (Global performance index)¶

For more information visit [21].

$$RR=\frac{1}{|C|}\sum_{i,j=1}^{|C|}Matrix(i,j)$$

cm.RR

4.0

Notice : new in version 1.4

CBA (Class balance accuracy)¶

As an evaluation tool, CBA creates an overall assessment of model predictive power by scrutinizing measures simultaneously across each class in a conservative manner that guarantees that a model’s ability to recall observations from each class and its ability to do so efficiently won’t fall below the bound [22] [51].

$$CBA=\frac{\sum_{i=1}^{|C|}\frac{Matrix(i,i)}{Max(TOP_i,P_i)}}{|C|}$$

cm.CBA

0.4777777777777778

Notice : new in version 1.4

AUNU¶

When dealing with multiclass problems, a global measure of classification performances based on the ROC approach (AUNU) has been proposed as the average of single-class measures [23].

$$AUNU=\frac{\sum_{i=1}^{|C|}AUC_i}{|C|}$$

cm.AUNU

0.6785714285714285

Notice : new in version 1.4

AUNP¶

Another option (AUNP) is that of averaging the $ AUC_i $ values with weights proportional to the number of samples experimentally belonging to each class, that is, the a priori class distribution [23].

$$AUNP=\sum_{i=1}^{|C|}\frac{P_i}{POP}AUC_i$$

cm.AUNP

0.6857142857142857

Notice : new in version 1.4

RCI (Relative classifier information)¶

Performance of different classifiers on the same domain can be measured by comparing relative classifier information while classifier information (mutual information) can be used for comparison across different decision problems [32] [22].

$$H_d=-\sum_{i=1}^{|C|}\Big(\frac{\sum_{l=1}^{|C|}Matrix(i,l)}{\sum_{h,k=1}^{|C|}Matrix(h,k)}log_2\frac{\sum_{l=1}^{|C|}Matrix(i,l)}{\sum_{h,k=1}^{|C|}Matrix(h,k)}\Big)=Entropy_{Reference}$$

$$H_o=\sum_{j=1}^{|C|}\Big(\frac{\sum_{k=1}^{|C|}Matrix(k,j)}{\sum_{h,l=0}^{|C|}Matrix(h,l)}H_{oj}\Big)=Entropy_{Conditional}$$

$$H_{oj}=-\sum_{i=1}^{|C|}\Big(\frac{Matrix(i,j)}{\sum_{k=1}^{|C|}Matrix(k,j)}log_2\frac{Matrix(i,j)}{\sum_{k=1}^{|C|}Matrix(k,j)}\Big)$$

$$RCI=\frac{H_d-H_o}{H_d}=\frac{MI}{Entropy_{Reference}}$$

cm.RCI

0.3533932006492363

Notice : new in version 1.5

Pearson's C¶

The contingency coefficient is a coefficient of association that tells whether two variables or data sets are independent or dependent of/on each other. It is also known as Pearson’s coefficient (not to be confused with Pearson’s coefficient of skewness) [43] [44].

$$C=\sqrt{\frac{\chi^2}{\chi^2+POP}}$$

cm.C

0.5956833971812706

Notice : new in version 2.0

CSI (Classification success index)¶

The Classification Success Index (CSI) is an overall measure defined by averaging ICSI over all classes [58].

$$CSI=\frac{1}{|C|}\sum_{i=1}^{|C|}{ICSI_i}$$

cm.CSI

0.1777777777777778

Notice : new in version 2.5

ARI (Adjusted Rand index)¶

The Rand index or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. From a mathematical standpoint, Rand index is related to the accuracy, but is applicable even when class labels are not used [68].

The Adjusted Rand Index (ARI) is frequently used in cluster validation since it is a measure of agreement between two partitions: one given by the clustering process and the other defined by external criteria, but it can also be used in supervised learning [69].

Wikipedia Page

$$X=\frac{\sum_{i}C_{2}^{P_i}\times \sum_{j}C_{2}^{TOP_j}}{C_2^{POP}}$$

$$ARI=\frac{\sum_{i,j}C_{2}^{Matrix(i,j)}-X}{\frac{1}{2}[\sum_{i}C_{2}^{P_i} + \sum_{j}C_{2}^{TOP_j}]-X}$$

cm.ARI

0.09206349206349207

Notice : $ C_{r}^{n} $ is the number of combinations of $ n $ objects taken $ r $

Notice : new in version 2.6

Bangdiwala's B¶

Bangdiwala's B statistic was created by Shrikant Bangdiwala in 1985 and is a measure of inter-rater agreement. While not as commonly used as the kappa statistic the B test has been used by various workers [72] [73].

Wikipedia Page

$$B=\frac{\sum_{i=1}^{|C|}TP_i^2}{\sum_{i=1}^{|C|}TOP_i\times P_i}$$

cm.B

0.37254901960784315

Notice : new in version 2.7

Krippendorff's alpha¶

Krippendorff's alpha coefficient, named after academic Klaus Krippendorff, is a statistical measure of the agreement achieved when coding a set of units of analysis in terms of the values of a variable. Krippendorff's alpha generalizes several known statistics, often called measures of inter-coder agreement, inter-rater reliability, reliability of coding given sets of units (as distinct from unitizing) but it also distinguishes itself from statistics that are called reliability coefficients but are unsuitable to the particulars of coding data generated for subsequent analysis [74].

Wikipedia Page

$$\epsilon = \frac{1}{2\times POP}$$

$$P_a=(1-\epsilon)\times ACC_{Overall}+\epsilon$$

$$P_e=RACCU_{Overall}$$

$$\alpha=\frac{P_a-P_e}{1-P_e}$$

cm.Alpha

0.3715846994535519

Notice : new in version 2.8

Weighted alpha¶

Weighted Krippendorff's alpha coefficient [74].

$$\epsilon = \frac{1}{2\times POP}$$

$$v_{ij}=1-\frac{w_{ij}}{max(w)}$$

$$P_e=\sum_{i,j=1}^{|C|}(\frac{TOP_i \times P_j}{2 \times POP})^2\times v_{ij}$$

$$P_a^*=\sum_{i,j=1}^{|C|}\frac{Matrix(i,j)}{POP}\times v_{ij}$$

$$P_a=(1-\epsilon)\times P_a^*+\epsilon$$

$$\alpha_{Weighted}=\frac{P_a-P_e}{1-P_e}$$

cm.weighted_alpha(
    weight={
        "L1": {"L1": 0, "L2": 1, "L3": 2},
        "L2": {"L1": 1, "L2": 0, "L3": 1},
        "L3": {"L1": 2, "L2": 1, "L3": 0}})

0.374757281553398

cm.weighted_alpha()

C:\Users\Sepkjaer\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pycm-4.4-py3.5.egg\pycm\cm.py:916: RuntimeWarning: Invalid weight format; the result is for unweighted alpha.

0.3715846994535519

Parameters¶

weight : weight matrix (type : dict, default : None)

Output¶

Weighted alpha

Notice : new in version 2.8

Aickin's alpha¶

Aickin's alpha coefficient [75].

$$\alpha^{(t+1)}=\frac{p_a-p_e^{(t)}}{1-p_e^{(t)}}$$

$$p_e^{(t)}=\sum_{k=1}^{|C|}p_{k|H}^{A(t)}\times p_{k|H}^{B(t)}$$

$$p_a=ACC_{Overall}$$

$$p_{k|H}^{A(t+1)}=\frac{TOP_k}{(1-\alpha^{(t)})+\alpha^{(t)}\times \frac{p_{k|H}^{B(t)}}{p_e^{(t)}}\times POP}$$

$$p_{k|H}^{B(t+1)}=\frac{P_k}{(1-\alpha^{(t)})+\alpha^{(t)}\times \frac{p_{k|H}^{A(t)}}{p_e^{(t)}}\times POP}$$

$$Stop:|\alpha^{(t+1)}-\alpha^{(t)}|<\epsilon$$

cm.aickin_alpha()

0.38540577344968524

cm.aickin_alpha(max_iter=2000, epsilon=0.00003)

0.38545857383594895

Parameters¶

epsilon : difference threshold (type : float, default : 0.0001)
max_iter : maximum number of iterations (type : int, default : 200)

Output¶

Aickin's alpha

Notice : new in version 2.8

Brier score¶

The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions. For unidimensional predictions, it is strictly equivalent to the mean squared error as applied to predicted probabilities [78] [79].

Wikipedia Page

$$BS=\frac{1}{N} \sum_{t=1}^{N}(f_t-o_t)^2$$

in which $f_t$ is the probability that was forecast, $o_t$ the actual outcome of the event at instance $t$ ($0$ if it does not happen and $1$ if it does happen) and $N$ is the number of forecasting instances.

cm_test = ConfusionMatrix([0, 1, 1, 0], [0.1, 0.9, 0.8, 0.3], threshold=lambda x: 1)
cm_test.brier_score()

0.03749999999999999

cm_test.brier_score(pos_class=0)

0.6875

Parameters¶

pos_class : positive class name (type : int/str, default : None)

Output¶

Brier score

Notice : new in version 3.4

Notice : This option only works in binary probability mode

Notice : pos_class always defaults to the greater class name (i.e. max(classes)), unless, the actual_vector contains string. In that case, pos_class does not have any default value, and it must be explicitly specified or else an error will result.

Log loss¶

In information theory, the cross-entropy between two probability distributions $p$ and $q$ over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution $q$, rather than the true distribution $p$. This is also known as the log loss (logarithmic loss or logistic loss); the terms "log loss" and "cross-entropy loss" are used interchangeably. [30].

Wikipedia Page

$$L_{\log}(y, p) = -(y \log (p) + (1 - y) \log (1 - p))$$

cm_test.log_loss()

0.19763488164214868

cm_test.log_loss(pos_class=0)

1.854645225687032

Parameters¶

pos_class : positive class name (type : int/str, default : None)
normalize : normalization flag (type : bool, default : True)

Output¶

Log loss

Notice : new in version 3.9

Notice : This option only works in binary probability mode

Notice : pos_class always defaults to the greater class name (i.e. max(classes)), unless, the actual_vector contains string. In that case, pos_class does not have any default value, and it must be explicitly specified or else an error will result.

Print¶

Full¶

print(cm)

Predict  L1       L2       L3       
Actual
L1       3        0        2        

L2       0        1        1        

L3       0        2        3        


Overall Statistics : 

95% CI                                                            (0.30439,0.86228)
ACC Macro                                                         0.72222
ARI                                                               0.09206
AUNP                                                              0.68571
AUNU                                                              0.67857
Bangdiwala B                                                      0.37255
Bennett S                                                         0.375
CBA                                                               0.47778
CSI                                                               0.17778
Chi-Squared                                                       6.6
Chi-Squared DF                                                    4
Conditional Entropy                                               0.97579
Cramer V                                                          0.5244
Cross Entropy                                                     1.58333
F1 Macro                                                          0.56515
F1 Micro                                                          0.58333
FNR Macro                                                         0.43333
FNR Micro                                                         0.41667
FPR Macro                                                         0.20952
FPR Micro                                                         0.20833
Gwet AC1                                                          0.38931
Hamming Loss                                                      0.41667
Joint Entropy                                                     2.45915
KL Divergence                                                     0.09998
Kappa                                                             0.35484
Kappa 95% CI                                                      (-0.07708,0.78675)
Kappa No Prevalence                                               0.16667
Kappa Standard Error                                              0.22036
Kappa Unbiased                                                    0.34426
Krippendorff Alpha                                                0.37158
Lambda A                                                          0.42857
Lambda B                                                          0.16667
Mutual Information                                                0.52421
NIR                                                               0.41667
NPV Macro                                                         0.77778
NPV Micro                                                         0.79167
Overall ACC                                                       0.58333
Overall CEN                                                       0.46381
Overall J                                                         (1.225,0.40833)
Overall MCC                                                       0.36667
Overall MCEN                                                      0.51894
Overall RACC                                                      0.35417
Overall RACCU                                                     0.36458
P-Value                                                           0.18926
PPV Macro                                                         0.61111
PPV Micro                                                         0.58333
Pearson C                                                         0.59568
Phi-Squared                                                       0.55
RCI                                                               0.35339
RR                                                                4.0
Reference Entropy                                                 1.48336
Response Entropy                                                  1.5
SOA1(Landis & Koch)                                               Fair
SOA2(Fleiss)                                                      Poor
SOA3(Altman)                                                      Fair
SOA4(Cicchetti)                                                   Poor
SOA5(Cramer)                                                      Relatively Strong
SOA6(Matthews)                                                    Weak
SOA7(Lambda A)                                                    Moderate
SOA8(Lambda B)                                                    Very Weak
SOA9(Krippendorff Alpha)                                          Low
SOA10(Pearson C)                                                  Strong
Scott PI                                                          0.34426
Standard Error                                                    0.14232
TNR Macro                                                         0.79048
TNR Micro                                                         0.79167
TPR Macro                                                         0.56667
TPR Micro                                                         0.58333
Zero-one Loss                                                     5

Class Statistics :

Classes                                                           L1            L2            L3            
ACC(Accuracy)                                                     0.83333       0.75          0.58333       
AGF(Adjusted F-score)                                             0.72859       0.62869       0.61009       
AGM(Adjusted geometric mean)                                      0.85764       0.70861       0.58034       
AM(Difference between automatic and manual classification)        -2            1             1             
AUC(Area under the ROC curve)                                     0.8           0.65          0.58571       
AUCI(AUC value interpretation)                                    Very Good     Fair          Poor          
AUPR(Area under the PR curve)                                     0.8           0.41667       0.55          
BB(Braun-Blanquet similarity)                                     0.6           0.33333       0.5           
BCD(Bray-Curtis dissimilarity)                                    0.08333       0.04167       0.04167       
BM(Informedness or bookmaker informedness)                        0.6           0.3           0.17143       
CEN(Confusion entropy)                                            0.25          0.49658       0.60442       
DOR(Diagnostic odds ratio)                                        None          4.0           2.0           
DP(Discriminant power)                                            None          0.33193       0.16597       
DPI(Discriminant power interpretation)                            None          Poor          Poor          
ERR(Error rate)                                                   0.16667       0.25          0.41667       
F0.5(F0.5 score)                                                  0.88235       0.35714       0.51724       
F1(F1 score - harmonic mean of precision and sensitivity)         0.75          0.4           0.54545       
F2(F2 score)                                                      0.65217       0.45455       0.57692       
FDR(False discovery rate)                                         0.0           0.66667       0.5           
FN(False negative/miss/type 2 error)                              2             1             2             
FNR(Miss rate or false negative rate)                             0.4           0.5           0.4           
FOR(False omission rate)                                          0.22222       0.11111       0.33333       
FP(False positive/type 1 error/false alarm)                       0             2             3             
FPR(Fall-out or false positive rate)                              0.0           0.2           0.42857       
G(G-measure geometric mean of precision and sensitivity)          0.7746        0.40825       0.54772       
GI(Gini index)                                                    0.6           0.3           0.17143       
GM(G-mean geometric mean of specificity and sensitivity)          0.7746        0.63246       0.58554       
HD(Hamming distance)                                              2             3             5             
IBA(Index of balanced accuracy)                                   0.36          0.28          0.35265       
ICSI(Individual classification success index)                     0.6           -0.16667      0.1           
IS(Information score)                                             1.26303       1.0           0.26303       
J(Jaccard index)                                                  0.6           0.25          0.375         
LS(Lift score)                                                    2.4           2.0           1.2           
MCC(Matthews correlation coefficient)                             0.68313       0.2582        0.16903       
MCCI(Matthews correlation coefficient interpretation)             Moderate      Negligible    Negligible    
MCEN(Modified confusion entropy)                                  0.26439       0.5           0.6875        
MK(Markedness)                                                    0.77778       0.22222       0.16667       
N(Condition negative)                                             7             10            7             
NLR(Negative likelihood ratio)                                    0.4           0.625         0.7           
NLRI(Negative likelihood ratio interpretation)                    Poor          Negligible    Negligible    
NPV(Negative predictive value)                                    0.77778       0.88889       0.66667       
OC(Overlap coefficient)                                           1.0           0.5           0.6           
OOC(Otsuka-Ochiai coefficient)                                    0.7746        0.40825       0.54772       
OP(Optimized precision)                                           0.58333       0.51923       0.55894       
P(Condition positive or support)                                  5             2             5             
PLR(Positive likelihood ratio)                                    None          2.5           1.4           
PLRI(Positive likelihood ratio interpretation)                    None          Poor          Poor          
POP(Population)                                                   12            12            12            
PPV(Precision or positive predictive value)                       1.0           0.33333       0.5           
PR(Positive rate)                                                 0.41667       0.16667       0.41667       
PRE(Prevalence)                                                   0.41667       0.16667       0.41667       
Q(Yule Q - coefficient of colligation)                            None          0.6           0.33333       
QI(Yule Q interpretation)                                         None          Moderate      Weak          
RACC(Random accuracy)                                             0.10417       0.04167       0.20833       
RACCU(Random accuracy unbiased)                                   0.11111       0.0434        0.21007       
TN(True negative/correct rejection)                               7             8             4             
TNR(Specificity or true negative rate)                            1.0           0.8           0.57143       
TON(Test outcome negative)                                        9             9             6             
TOP(Test outcome positive)                                        3             3             6             
TOPR(Test outcome positive rate)                                  0.25          0.25          0.5           
TP(True positive/hit)                                             3             1             3             
TPR(Sensitivity, recall, hit rate, or true positive rate)         0.6           0.5           0.6           
Y(Youden index)                                                   0.6           0.3           0.17143       
dInd(Distance index)                                              0.4           0.53852       0.58624       
sInd(Similarity index)                                            0.71716       0.61921       0.58547

Matrix¶

cm.print_matrix()

Predict  L1       L2       L3       
Actual
L1       3        0        2        

L2       0        1        1        

L3       0        2        3

cm.matrix

{'L1': {'L1': 3, 'L2': 0, 'L3': 2},
 'L2': {'L1': 0, 'L2': 1, 'L3': 1},
 'L3': {'L1': 0, 'L2': 2, 'L3': 3}}

cm.print_matrix(one_vs_all=True, class_name="L1")

Predict  L1       ~        
Actual
L1       3        2        

~        0        7

sparse_cm = ConfusionMatrix(matrix={1: {1: 0, 2: 2}, 2: {1: 0, 2: 18}})

sparse_cm.print_matrix(sparse=True)

Predict  2        
Actual
1        2        

2        18

Parameters¶

one_vs_all : one-vs-all mode flag (type : bool, default : False)
class_name : target class name for one-vs-all mode (type : any valid type, default : None)
sparse : sparse mode printing flag (type : bool, default : False)

Notice : one_vs_all option, new in version 1.4

Notice : matrix() renamed to print_matrix() and matrix return confusion matrix as dict in version 1.5

Notice : sparse option, new in version 2.6

Normalized matrix¶

cm.print_normalized_matrix()

Predict   L1        L2        L3        
Actual
L1        0.6       0.0       0.4       

L2        0.0       0.5       0.5       

L3        0.0       0.4       0.6

cm.normalized_matrix

{'L1': {'L1': 0.6, 'L2': 0.0, 'L3': 0.4},
 'L2': {'L1': 0.0, 'L2': 0.5, 'L3': 0.5},
 'L3': {'L1': 0.0, 'L2': 0.4, 'L3': 0.6}}

cm.print_normalized_matrix(one_vs_all=True, class_name="L1")

Predict   L1        ~         
Actual
L1        0.6       0.4       

~         0.0       1.0

sparse_cm.print_normalized_matrix(sparse=True)

Predict   2         
Actual
1         1.0       

2         1.0

Parameters¶

one_vs_all : one-vs-all mode flag (type : bool, default : False)
class_name : target class name for one-vs-all mode (type : any valid type, default : None)
sparse : sparse mode printing flag (type : bool, default : False)

Notice : one_vs_all option, new in version 1.4

Notice : normalized_matrix() renamed to print_normalized_matrix() and normalized_matrix return normalized confusion matrix as dict in version 1.5

Notice : sparse option, new in version 2.6

Stat¶

cm.stat()

Overall Statistics : 

95% CI                                                            (0.30439,0.86228)
ACC Macro                                                         0.72222
ARI                                                               0.09206
AUNP                                                              0.68571
AUNU                                                              0.67857
Bangdiwala B                                                      0.37255
Bennett S                                                         0.375
CBA                                                               0.47778
CSI                                                               0.17778
Chi-Squared                                                       6.6
Chi-Squared DF                                                    4
Conditional Entropy                                               0.97579
Cramer V                                                          0.5244
Cross Entropy                                                     1.58333
F1 Macro                                                          0.56515
F1 Micro                                                          0.58333
FNR Macro                                                         0.43333
FNR Micro                                                         0.41667
FPR Macro                                                         0.20952
FPR Micro                                                         0.20833
Gwet AC1                                                          0.38931
Hamming Loss                                                      0.41667
Joint Entropy                                                     2.45915
KL Divergence                                                     0.09998
Kappa                                                             0.35484
Kappa 95% CI                                                      (-0.07708,0.78675)
Kappa No Prevalence                                               0.16667
Kappa Standard Error                                              0.22036
Kappa Unbiased                                                    0.34426
Krippendorff Alpha                                                0.37158
Lambda A                                                          0.42857
Lambda B                                                          0.16667
Mutual Information                                                0.52421
NIR                                                               0.41667
NPV Macro                                                         0.77778
NPV Micro                                                         0.79167
Overall ACC                                                       0.58333
Overall CEN                                                       0.46381
Overall J                                                         (1.225,0.40833)
Overall MCC                                                       0.36667
Overall MCEN                                                      0.51894
Overall RACC                                                      0.35417
Overall RACCU                                                     0.36458
P-Value                                                           0.18926
PPV Macro                                                         0.61111
PPV Micro                                                         0.58333
Pearson C                                                         0.59568
Phi-Squared                                                       0.55
RCI                                                               0.35339
RR                                                                4.0
Reference Entropy                                                 1.48336
Response Entropy                                                  1.5
SOA1(Landis & Koch)                                               Fair
SOA2(Fleiss)                                                      Poor
SOA3(Altman)                                                      Fair
SOA4(Cicchetti)                                                   Poor
SOA5(Cramer)                                                      Relatively Strong
SOA6(Matthews)                                                    Weak
SOA7(Lambda A)                                                    Moderate
SOA8(Lambda B)                                                    Very Weak
SOA9(Krippendorff Alpha)                                          Low
SOA10(Pearson C)                                                  Strong
Scott PI                                                          0.34426
Standard Error                                                    0.14232
TNR Macro                                                         0.79048
TNR Micro                                                         0.79167
TPR Macro                                                         0.56667
TPR Micro                                                         0.58333
Zero-one Loss                                                     5

Class Statistics :

Classes                                                           L1            L2            L3            
ACC(Accuracy)                                                     0.83333       0.75          0.58333       
AGF(Adjusted F-score)                                             0.72859       0.62869       0.61009       
AGM(Adjusted geometric mean)                                      0.85764       0.70861       0.58034       
AM(Difference between automatic and manual classification)        -2            1             1             
AUC(Area under the ROC curve)                                     0.8           0.65          0.58571       
AUCI(AUC value interpretation)                                    Very Good     Fair          Poor          
AUPR(Area under the PR curve)                                     0.8           0.41667       0.55          
BB(Braun-Blanquet similarity)                                     0.6           0.33333       0.5           
BCD(Bray-Curtis dissimilarity)                                    0.08333       0.04167       0.04167       
BM(Informedness or bookmaker informedness)                        0.6           0.3           0.17143       
CEN(Confusion entropy)                                            0.25          0.49658       0.60442       
DOR(Diagnostic odds ratio)                                        None          4.0           2.0           
DP(Discriminant power)                                            None          0.33193       0.16597       
DPI(Discriminant power interpretation)                            None          Poor          Poor          
ERR(Error rate)                                                   0.16667       0.25          0.41667       
F0.5(F0.5 score)                                                  0.88235       0.35714       0.51724       
F1(F1 score - harmonic mean of precision and sensitivity)         0.75          0.4           0.54545       
F2(F2 score)                                                      0.65217       0.45455       0.57692       
FDR(False discovery rate)                                         0.0           0.66667       0.5           
FN(False negative/miss/type 2 error)                              2             1             2             
FNR(Miss rate or false negative rate)                             0.4           0.5           0.4           
FOR(False omission rate)                                          0.22222       0.11111       0.33333       
FP(False positive/type 1 error/false alarm)                       0             2             3             
FPR(Fall-out or false positive rate)                              0.0           0.2           0.42857       
G(G-measure geometric mean of precision and sensitivity)          0.7746        0.40825       0.54772       
GI(Gini index)                                                    0.6           0.3           0.17143       
GM(G-mean geometric mean of specificity and sensitivity)          0.7746        0.63246       0.58554       
HD(Hamming distance)                                              2             3             5             
IBA(Index of balanced accuracy)                                   0.36          0.28          0.35265       
ICSI(Individual classification success index)                     0.6           -0.16667      0.1           
IS(Information score)                                             1.26303       1.0           0.26303       
J(Jaccard index)                                                  0.6           0.25          0.375         
LS(Lift score)                                                    2.4           2.0           1.2           
MCC(Matthews correlation coefficient)                             0.68313       0.2582        0.16903       
MCCI(Matthews correlation coefficient interpretation)             Moderate      Negligible    Negligible    
MCEN(Modified confusion entropy)                                  0.26439       0.5           0.6875        
MK(Markedness)                                                    0.77778       0.22222       0.16667       
N(Condition negative)                                             7             10            7             
NLR(Negative likelihood ratio)                                    0.4           0.625         0.7           
NLRI(Negative likelihood ratio interpretation)                    Poor          Negligible    Negligible    
NPV(Negative predictive value)                                    0.77778       0.88889       0.66667       
OC(Overlap coefficient)                                           1.0           0.5           0.6           
OOC(Otsuka-Ochiai coefficient)                                    0.7746        0.40825       0.54772       
OP(Optimized precision)                                           0.58333       0.51923       0.55894       
P(Condition positive or support)                                  5             2             5             
PLR(Positive likelihood ratio)                                    None          2.5           1.4           
PLRI(Positive likelihood ratio interpretation)                    None          Poor          Poor          
POP(Population)                                                   12            12            12            
PPV(Precision or positive predictive value)                       1.0           0.33333       0.5           
PR(Positive rate)                                                 0.41667       0.16667       0.41667       
PRE(Prevalence)                                                   0.41667       0.16667       0.41667       
Q(Yule Q - coefficient of colligation)                            None          0.6           0.33333       
QI(Yule Q interpretation)                                         None          Moderate      Weak          
RACC(Random accuracy)                                             0.10417       0.04167       0.20833       
RACCU(Random accuracy unbiased)                                   0.11111       0.0434        0.21007       
TN(True negative/correct rejection)                               7             8             4             
TNR(Specificity or true negative rate)                            1.0           0.8           0.57143       
TON(Test outcome negative)                                        9             9             6             
TOP(Test outcome positive)                                        3             3             6             
TOPR(Test outcome positive rate)                                  0.25          0.25          0.5           
TP(True positive/hit)                                             3             1             3             
TPR(Sensitivity, recall, hit rate, or true positive rate)         0.6           0.5           0.6           
Y(Youden index)                                                   0.6           0.3           0.17143       
dInd(Distance index)                                              0.4           0.53852       0.58624       
sInd(Similarity index)                                            0.71716       0.61921       0.58547

cm.stat(overall_param=["Kappa"], class_param=["ACC", "AUC", "TPR"])

Overall Statistics : 

Kappa                                                             0.35484

Class Statistics :

Classes                                                           L1            L2            L3            
ACC(Accuracy)                                                     0.83333       0.75          0.58333       
AUC(Area under the ROC curve)                                     0.8           0.65          0.58571       
TPR(Sensitivity, recall, hit rate, or true positive rate)         0.6           0.5           0.6

cm.stat(overall_param=["Kappa"], class_param=["ACC", "AUC", "TPR"], class_name=["L1", "L3"])

Overall Statistics : 

Kappa                                                             0.35484

Class Statistics :

Classes                                                           L1            L3            
ACC(Accuracy)                                                     0.83333       0.58333       
AUC(Area under the ROC curve)                                     0.8           0.58571       
TPR(Sensitivity, recall, hit rate, or true positive rate)         0.6           0.6

cm.stat(summary=True)

Overall Statistics : 

ACC Macro                                                         0.72222
F1 Macro                                                          0.56515
FPR Macro                                                         0.20952
Kappa                                                             0.35484
NPV Macro                                                         0.77778
Overall ACC                                                       0.58333
PPV Macro                                                         0.61111
SOA1(Landis & Koch)                                               Fair
TPR Macro                                                         0.56667
Zero-one Loss                                                     5

Class Statistics :

Classes                                                           L1            L2            L3            
ACC(Accuracy)                                                     0.83333       0.75          0.58333       
AUC(Area under the ROC curve)                                     0.8           0.65          0.58571       
AUCI(AUC value interpretation)                                    Very Good     Fair          Poor          
F1(F1 score - harmonic mean of precision and sensitivity)         0.75          0.4           0.54545       
FN(False negative/miss/type 2 error)                              2             1             2             
FP(False positive/type 1 error/false alarm)                       0             2             3             
FPR(Fall-out or false positive rate)                              0.0           0.2           0.42857       
N(Condition negative)                                             7             10            7             
P(Condition positive or support)                                  5             2             5             
POP(Population)                                                   12            12            12            
PPV(Precision or positive predictive value)                       1.0           0.33333       0.5           
TN(True negative/correct rejection)                               7             8             4             
TON(Test outcome negative)                                        9             9             6             
TOP(Test outcome positive)                                        3             3             6             
TP(True positive/hit)                                             3             1             3             
TPR(Sensitivity, recall, hit rate, or true positive rate)         0.6           0.5           0.6

Parameters¶

overall_param : overall parameters list for print (type : list, default : None)
class_param : class parameters list for print (type : list, default : None)
class_name : class name (a subset of classes names) (type : list, default : None)
summary : summary mode flag (type : bool, default : False)

Notice : cm.params() in prev versions (0.2 >)

Notice : overall_param & class_param , new in version 1.6

Notice : class_name , new in version 1.7

Notice : summary , new in version 2.4

Compare report¶

cp.print_report()

Best : cm2

Rank  Name   Class-Score       Overall-Score
1     cm2    0.50278           0.58095
2     cm3    0.33611           0.52857

print(cp)

Best : cm2

Rank  Name   Class-Score       Overall-Score
1     cm2    0.50278           0.58095
2     cm3    0.33611           0.52857

Save¶

import os
if "Document_files" not in os.listdir():
    os.mkdir("Document_files")

.pycm file¶

cm.save_stat(os.path.join("Document_files", "cm1"))

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1.pycm',
 'Status': True}

Open File

cm.save_stat(
    os.path.join("Document_files", "cm1_filtered"),
    overall_param=["Kappa"],
    class_param=["ACC", "AUC", "TPR"])

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_filtered.pycm',
 'Status': True}

Open File

cm.save_stat(
    os.path.join("Document_files", "cm1_filtered2"),
    overall_param=["Kappa"],
    class_param=["ACC", "AUC", "TPR"],
    class_name=["L1"])

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_filtered2.pycm',
 'Status': True}

Open File

cm.save_stat(
    os.path.join("Document_files", "cm1_summary"),
    summary=True)

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_summary.pycm',
 'Status': True}

Open File

sparse_cm.save_stat(
    os.path.join("Document_files", "sparse_cm"),
    summary=True,
    sparse=True)

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\sparse_cm.pycm',
 'Status': True}

Open File

cm.save_stat("cm1asdasd/")

{'Message': "[Errno 2] No such file or directory: 'cm1asdasd/.pycm'",
 'Status': False}

Parameters¶

name : filename (type : str)
address : flag for address return (type : bool, default : True)
overall_param : overall parameters list for save (type : list, default : None)
class_param : class parameters list for save (type : list, default : None)
class_name : class name (subset of classes names) (type : list, default : None)
summary : summary mode flag (type : bool, default : False)
sparse : sparse mode printing flag (type : bool, default : False)

Notice : new in version 0.4

Notice : overall_param & class_param , new in version 1.6

Notice : class_name , new in version 1.7

Notice : summary , new in version 2.4

Notice : sparse, new in version 2.6

HTML¶

cm.save_html(os.path.join("Document_files", "cm1"))

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1.html',
 'Status': True}

Open File

cm.save_html(
    os.path.join("Document_files", "cm1_filtered"),
    overall_param=["Kappa"],
    class_param=["ACC", "AUC", "TPR"])

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_filtered.html',
 'Status': True}

Open File

cm.save_html(
    os.path.join("Document_files", "cm1_filtered2"),
    overall_param=["Kappa"],
    class_param=["ACC", "AUC", "TPR"],
    class_name=["L1"])

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_filtered2.html',
 'Status': True}

Open File

cm.save_html(
    os.path.join("Document_files", "cm1_colored"),
    color=(255, 204, 255))

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_colored.html',
 'Status': True}

Open File

cm.save_html(
    os.path.join("Document_files", "cm1_colored2"),
    color="Crimson")

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_colored2.html',
 'Status': True}

Open File

cm.save_html(
    os.path.join("Document_files", "cm1_normalized"),
    color="Crimson",
    normalize=True)

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_normalized.html',
 'Status': True}

Open File

cm.save_html(
    os.path.join("Document_files", "cm1_summary"),
    summary=True,
    normalize=True)

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_summary.html',
 'Status': True}

Open File

cm.save_html("cm1asdasd/")

{'Message': "[Errno 2] No such file or directory: 'cm1asdasd/.html'",
 'Status': False}

Parameters¶

name : filename (type : str)
address : flag for address return (type : bool, default : True)
overall_param : overall parameters list for save (type : list, default : None)
class_param : class parameters list for save (type : list, default : None)
class_name : class name (subset of classes names) (type : list, default : None)
color : matrix color in RGB as (R, G, B) (type : tuple/str, default : (0,0,0)), support X11 color names
normalize : save normalize matrix flag (type : bool, default : False)
summary : summary mode flag (type : bool, default : False)
alt_link : alternative link for document flag (type : bool, default : False)
shortener : class name shortener flag (type : bool, default : True)

Notice : new in version 0.5

Notice : overall_param & class_param , new in version 1.6

Notice : class_name , new in version 1.7

Notice : color, new in version 1.8

Notice : normalize, new in version 2.0

Notice : summary and alt_link , new in version 2.4

Notice : shortener, new in version 3.2

Notice : If PyCM website is not available, set alt_link=True

CSV¶

cm.save_csv(os.path.join("Document_files", "cm1"))

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1.csv',
 'Status': True}

Open Stat File
Open Matrix File

cm.save_csv(
    os.path.join("Document_files", "cm1_filtered"),
    class_param=["ACC", "AUC", "TPR"])

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_filtered.csv',
 'Status': True}

Open Stat File
Open Matrix File

cm.save_csv(
    os.path.join("Document_files", "cm1_filtered2"),
    class_param=["ACC", "AUC", "TPR"],
    normalize=True)

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_filtered2.csv',
 'Status': True}

Open Stat File
Open Matrix File

cm.save_csv(
    os.path.join("Document_files", "cm1_filtered3"),
    class_param=["ACC", "AUC", "TPR"],
    class_name=["L1"])

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_filtered3.csv',
 'Status': True}

Open Stat File
Open Matrix File

cm.save_csv(
    os.path.join("Document_files", "cm1_header"),
    header=True)

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_header.csv',
 'Status': True}

Open Stat File
Open Matrix File

cm.save_csv(
    os.path.join("Document_files", "cm1_summary"),
    summary=True,
    matrix_save=False)

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_summary.csv',
 'Status': True}

Open Stat File

cm.save_csv("cm1asdasd/")

{'Message': "[Errno 2] No such file or directory: 'cm1asdasd/.csv'",
 'Status': False}

Parameters¶

name : filename (type : str)
address : flag for address return (type : bool, default : True)
class_param : class parameters list for save (type : list, default : None)
class_name : class name (subset of classes names) (type : list, default : None)
matrix_save : save matrix flag (type : bool, default : True)
normalize : save normalize matrix flag (type : bool, default : False)
summary : summary mode flag (type : bool, default : False)
header : add headers to .csv file (type : bool, default : False)

Notice : new in version 0.6

Notice : class_param , new in version 1.6

Notice : class_name , new in version 1.7

Notice : matrix_save and normalize, new in version 1.9

Notice : summary , new in version 2.4

Notice : header , new in version 2.6

OBJ¶

cm.save_obj(os.path.join("Document_files", "cm1"))

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1.obj',
 'Status': True}

Open File

cm.save_obj(
    os.path.join("Document_files", "cm1_stat"),
    save_stat=True)

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_stat.obj',
 'Status': True}

Open File

cm.save_obj(
    os.path.join("Document_files", "cm1_no_vectors"),
    save_vector=False)

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cm1_no_vectors.obj',
 'Status': True}

Open File

cm.save_obj("cm1asdasd/")

{'Message': "[Errno 2] No such file or directory: 'cm1asdasd/.obj'",
 'Status': False}

Parameters¶

name : filename (type : str)
address : flag for address return (type : bool, default : True)
save_stat : save statistics flag (type : bool, default : False)
save_vector : save vectors flag (type : bool, default : True)

Notice : new in version 0.9.5

Notice : save_vector and save_stat, new in version 2.3

Notice : For more information visit Example 4

comp¶

cp.save_report(os.path.join("Document_files", "cp"))

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\doc_html\\Document_files\\cp.comp',
 'Status': True}

Open File

cp.save_report("cm1asdasd/")

{'Message': "[Errno 2] No such file or directory: 'cm1asdasd/.comp'",
 'Status': False}

Parameters¶

name : filename (type : str)
address : flag for address return (type : bool, default : True)

Notice : new in version 2.0

Benchmarking¶

PyCM offers a dedicated benchmarking method for performance analysis across different hardware. Using this utility, you can benchmark pycm's runtime performance with the run_report_benchmark function. This allows users to measure the time required for various internal operations, such as generating a confusion matrix and calculating metrics, on their own hardware, using predefined test cases.

from pycm import run_report_benchmark
run_report_benchmark(seed=0)

Number of classes: 2
Population size: 10
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.0003780738s
    + Class statistics: 2.05432e-05s
    + Overall statistics: 7.34814e-05s
    + Total: 0.0004720984s
======================================================

Number of classes: 2
Population size: 10
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.0003804441s
    + Class statistics: 1.81728e-05s
    + Overall statistics: 2.09383e-05s
    + Total: 0.0004195552s
======================================================

Number of classes: 2
Population size: 10
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.0002729874s
    + Class statistics: 1.69876e-05s
    + Overall statistics: 1.97531e-05s
    + Total: 0.0003097282s
======================================================

Number of classes: 5
Population size: 10
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.0005428144s
    + Class statistics: 1.65926e-05s
    + Overall statistics: 1.9358e-05s
    + Total: 0.000578765s
======================================================

Number of classes: 5
Population size: 10
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.0005412341s
    + Class statistics: 1.65926e-05s
    + Overall statistics: 1.9358e-05s
    + Total: 0.0005771847s
======================================================

Number of classes: 5
Population size: 10
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.0006012835s
    + Class statistics: 1.69876e-05s
    + Overall statistics: 1.97531e-05s
    + Total: 0.0006380242s
======================================================

Number of classes: 10
Population size: 10
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.001350715s
    + Class statistics: 1.77778e-05s
    + Overall statistics: 2.05432e-05s
    + Total: 0.0013890359s
======================================================

Number of classes: 10
Population size: 10
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.0010544189s
    + Class statistics: 1.77778e-05s
    + Overall statistics: 2.01481e-05s
    + Total: 0.0010923448s
======================================================

Number of classes: 10
Population size: 10
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.001035851s
    + Class statistics: 1.69876e-05s
    + Overall statistics: 1.97531e-05s
    + Total: 0.0010725917s
======================================================

Number of classes: 100
Population size: 10
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.0297216555s
    + Class statistics: 2.21234e-05s
    + Overall statistics: 2.33086e-05s
    + Total: 0.0297670876s
======================================================

Number of classes: 100
Population size: 10
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.0284491627s
    + Class statistics: 2.21234e-05s
    + Overall statistics: 2.37037e-05s
    + Total: 0.0284949898s
======================================================

Number of classes: 100
Population size: 10
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.0269088182s
    + Class statistics: 2.84444e-05s
    + Overall statistics: 2.48889e-05s
    + Total: 0.0269621515s
======================================================

Number of classes: 1000
Population size: 10
Class distribution scenario: uniform
Timing:
    + Matrix creation: 2.542049152s
    + Class statistics: 3.99012e-05s
    + Overall statistics: 3.99012e-05s
    + Total: 2.5421289544s
======================================================

Number of classes: 1000
Population size: 10
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 2.502238813s
    + Class statistics: 2.44938e-05s
    + Overall statistics: 2.48889e-05s
    + Total: 2.5022881957s
======================================================

Number of classes: 1000
Population size: 10
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 2.3775740227s
    + Class statistics: 5.49135e-05s
    + Overall statistics: 8.05925e-05s
    + Total: 2.3777095287s
======================================================

Number of classes: 2
Population size: 100
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.0014494803s
    + Class statistics: 2.44938e-05s
    + Overall statistics: 2.44938e-05s
    + Total: 0.001498468s
======================================================

Number of classes: 2
Population size: 100
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.0013293817s
    + Class statistics: 2.40987e-05s
    + Overall statistics: 2.37037e-05s
    + Total: 0.0013771841s
======================================================

Number of classes: 2
Population size: 100
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.0011508139s
    + Class statistics: 2.01481e-05s
    + Overall statistics: 2.13333e-05s
    + Total: 0.0011922954s
======================================================

Number of classes: 5
Population size: 100
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.0015423198s
    + Class statistics: 3.39753e-05s
    + Overall statistics: 2.25185e-05s
    + Total: 0.0015988136s
======================================================

Number of classes: 5
Population size: 100
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.0016118506s
    + Class statistics: 5.72839e-05s
    + Overall statistics: 5.68888e-05s
    + Total: 0.0017260233s
======================================================

Number of classes: 5
Population size: 100
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.0013601965s
    + Class statistics: 1.97531e-05s
    + Overall statistics: 2.21234e-05s
    + Total: 0.001402073s
======================================================

Number of classes: 10
Population size: 100
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.0020594552s
    + Class statistics: 2.05432e-05s
    + Overall statistics: 2.21234e-05s
    + Total: 0.0021021218s
======================================================

Number of classes: 10
Population size: 100
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.0022475044s
    + Class statistics: 2.33086e-05s
    + Overall statistics: 2.33086e-05s
    + Total: 0.0022941216s
======================================================

Number of classes: 10
Population size: 100
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.0022684427s
    + Class statistics: 2.01481e-05s
    + Overall statistics: 2.21234e-05s
    + Total: 0.0023107142s
======================================================

Number of classes: 100
Population size: 100
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.0503944293s
    + Class statistics: 4.26666e-05s
    + Overall statistics: 0.0001165431s
    + Total: 0.0505536391s
======================================================

Number of classes: 100
Population size: 100
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.0418970533s
    + Class statistics: 6.4395e-05s
    + Overall statistics: 5.88642e-05s
    + Total: 0.0420203125s
======================================================

Number of classes: 100
Population size: 100
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.0546765s
    + Class statistics: 2.64691e-05s
    + Overall statistics: 2.52839e-05s
    + Total: 0.0547282531s
======================================================

Number of classes: 1000
Population size: 100
Class distribution scenario: uniform
Timing:
    + Matrix creation: 2.7053529242s
    + Class statistics: 2.60741e-05s
    + Overall statistics: 4.26666e-05s
    + Total: 2.7054216649s
======================================================

Number of classes: 1000
Population size: 100
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 2.4304265241s
    + Class statistics: 2.44938e-05s
    + Overall statistics: 2.44938e-05s
    + Total: 2.4304755117s
======================================================

Number of classes: 1000
Population size: 100
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 2.242280895s
    + Class statistics: 4.85926e-05s
    + Overall statistics: 5.60987e-05s
    + Total: 2.2423855863s
======================================================

Number of classes: 2
Population size: 1000
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.1407256172s
    + Class statistics: 2.40987e-05s
    + Overall statistics: 2.33086e-05s
    + Total: 0.1407730246s
======================================================

Number of classes: 2
Population size: 1000
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.1389237421s
    + Class statistics: 4.34568e-05s
    + Overall statistics: 2.44938e-05s
    + Total: 0.1389916926s
======================================================

Number of classes: 2
Population size: 1000
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.1404913458s
    + Class statistics: 2.88395e-05s
    + Overall statistics: 3.16049e-05s
    + Total: 0.1405517902s
======================================================

Number of classes: 5
Population size: 1000
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.1292459226s
    + Class statistics: 2.44938e-05s
    + Overall statistics: 2.37037e-05s
    + Total: 0.1292941201s
======================================================

Number of classes: 5
Population size: 1000
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.1288733797s
    + Class statistics: 2.52839e-05s
    + Overall statistics: 2.29136e-05s
    + Total: 0.1289215771s
======================================================

Number of classes: 5
Population size: 1000
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.1225721501s
    + Class statistics: 2.33086e-05s
    + Overall statistics: 2.25185e-05s
    + Total: 0.1226179772s
======================================================

Number of classes: 10
Population size: 1000
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.092487038s
    + Class statistics: 2.13333e-05s
    + Overall statistics: 2.25185e-05s
    + Total: 0.0925308899s
======================================================

Number of classes: 10
Population size: 1000
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.0938476296s
    + Class statistics: 2.40987e-05s
    + Overall statistics: 2.29136e-05s
    + Total: 0.0938946419s
======================================================

Number of classes: 10
Population size: 1000
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.0924155319s
    + Class statistics: 2.37037e-05s
    + Overall statistics: 2.25185e-05s
    + Total: 0.0924617541s
======================================================

Number of classes: 100
Population size: 1000
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.1720753208s
    + Class statistics: 2.33086e-05s
    + Overall statistics: 2.37037e-05s
    + Total: 0.1721223331s
======================================================

Number of classes: 100
Population size: 1000
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.1710189266s
    + Class statistics: 2.44938e-05s
    + Overall statistics: 2.37037e-05s
    + Total: 0.1710671241s
======================================================

Number of classes: 100
Population size: 1000
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.1701185816s
    + Class statistics: 2.33086e-05s
    + Overall statistics: 6.36049e-05s
    + Total: 0.1702054951s
======================================================

Number of classes: 1000
Population size: 1000
Class distribution scenario: uniform
Timing:
    + Matrix creation: 4.0695746611s
    + Class statistics: 2.29136e-05s
    + Overall statistics: 2.33086e-05s
    + Total: 4.0696208833s
======================================================

Number of classes: 1000
Population size: 1000
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 2.6762546756s
    + Class statistics: 4.74074e-05s
    + Overall statistics: 6.00493e-05s
    + Total: 2.6763621323s
======================================================

Number of classes: 1000
Population size: 1000
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 4.4038653599s
    + Class statistics: 2.37037e-05s
    + Overall statistics: 2.40987e-05s
    + Total: 4.4039131623s
======================================================

Number of classes: 2
Population size: 10000
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.0040434536s
    + Class statistics: 1.9358e-05s
    + Overall statistics: 2.5679e-05s
    + Total: 0.0040884906s
======================================================

Number of classes: 2
Population size: 10000
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.0036361453s
    + Class statistics: 1.65926e-05s
    + Overall statistics: 1.85679e-05s
    + Total: 0.0036713057s
======================================================

Number of classes: 2
Population size: 10000
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.003846713s
    + Class statistics: 1.73827e-05s
    + Overall statistics: 1.85679e-05s
    + Total: 0.0038826636s
======================================================

Number of classes: 5
Population size: 10000
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.003936392s
    + Class statistics: 1.61975e-05s
    + Overall statistics: 1.85679e-05s
    + Total: 0.0039711574s
======================================================

Number of classes: 5
Population size: 10000
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.003968787s
    + Class statistics: 1.69876e-05s
    + Overall statistics: 1.9358e-05s
    + Total: 0.0040051326s
======================================================

Number of classes: 5
Population size: 10000
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.0044140212s
    + Class statistics: 1.85679e-05s
    + Overall statistics: 1.97531e-05s
    + Total: 0.0044523422s
======================================================

Number of classes: 10
Population size: 10000
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.0048292308s
    + Class statistics: 1.73827e-05s
    + Overall statistics: 1.89629e-05s
    + Total: 0.0048655764s
======================================================

Number of classes: 10
Population size: 10000
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.0048537246s
    + Class statistics: 1.61975e-05s
    + Overall statistics: 1.81728e-05s
    + Total: 0.0048880949s
======================================================

Number of classes: 10
Population size: 10000
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.0047190086s
    + Class statistics: 1.61975e-05s
    + Overall statistics: 1.81728e-05s
    + Total: 0.004753379s
======================================================

Number of classes: 100
Population size: 10000
Class distribution scenario: uniform
Timing:
    + Matrix creation: 0.0626508147s
    + Class statistics: 4.74074e-05s
    + Overall statistics: 5.57037e-05s
    + Total: 0.0627539257s
======================================================

Number of classes: 100
Population size: 10000
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 0.0637052336s
    + Class statistics: 2.13333e-05s
    + Overall statistics: 2.33086e-05s
    + Total: 0.0637498756s
======================================================

Number of classes: 100
Population size: 10000
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 0.0628301726s
    + Class statistics: 2.09383e-05s
    + Overall statistics: 2.25185e-05s
    + Total: 0.0628736293s
======================================================

Number of classes: 1000
Population size: 10000
Class distribution scenario: uniform
Timing:
    + Matrix creation: 4.3522727093s
    + Class statistics: 2.52839e-05s
    + Overall statistics: 5.45185e-05s
    + Total: 4.3523525117s
======================================================

Number of classes: 1000
Population size: 10000
Class distribution scenario: majority_class
Timing:
    + Matrix creation: 4.4666824955s
    + Class statistics: 2.40987e-05s
    + Overall statistics: 2.60741e-05s
    + Total: 4.4667326683s
======================================================

Number of classes: 1000
Population size: 10000
Class distribution scenario: minority_class
Timing:
    + Matrix creation: 3.8734242975s
    + Class statistics: 2.44938e-05s
    + Overall statistics: 2.44938e-05s
    + Total: 3.8734732852s
======================================================

The predefined test cases are offered as a set of predefined data distribution scenarios through the ClassDistributionScenario class. With this you can simulate class imbalance in various ways. The available options include:

ClassDistributionScenario.UNIFORM: All classes have equal frequency (balanced distribution).
ClassDistributionScenario.MAJORITY_CLASS: One class appears 5 times more frequently than the others, which have equal frequency.
ClassDistributionScenario.MINORITY_CLASS: One class appears 5 times less frequently than the others, which have equal frequency.

You can generate random confusion matrices based on these scenarios using the following code:

from pycm import generate_confusion_matrix_with_scenario, ClassDistributionScenario

for scenario in ClassDistributionScenario:
    rand_cm_dict = generate_confusion_matrix_with_scenario(
        num_classes=3,
        total_population=1000,
        scenario=scenario,
        seed=0)
    rand_cm = ConfusionMatrix(matrix=rand_cm_dict)
    print("Confusion matrix generated with scenario {scenario_name}".format(scenario_name=scenario.name))
    rand_cm.print_matrix()

Confusion matrix generated with scenario UNIFORM
Predict   0         1         2         
Actual
0         310       14        10        

1         24        257       52        

2         79        11        243       


Confusion matrix generated with scenario MAJORITY_CLASS
Predict   0         1         2         
Actual
0         662       32        22        

1         10        110       22        

2         33        4         105       


Confusion matrix generated with scenario MINORITY_CLASS
Predict   0         1         2         
Actual
0         84        4         2         

1         33        352       71        

2         108       15        331

Notice : new in version 4.4

Input errors¶

try:
    cm2 = ConfusionMatrix(y_actu, 2)
except pycmVectorError as e:
    print(str(e))

Input vectors must be provided as a list or a NumPy array.

try:
    cm3 = ConfusionMatrix(y_actu, [1, 2, 3])
except pycmVectorError as e:
    print(str(e))

Input vectors must have the same length.

try:
    cm_4 = ConfusionMatrix([], [])
except pycmVectorError as e:
    print(str(e))

Input vectors must not be empty.

try:
    cm_5 = ConfusionMatrix([1, 1, 1, ], [1, 1, 1, 1])
except pycmVectorError as e:
    print(str(e))

Input vectors must have the same length.

try:
    cm3 = ConfusionMatrix(matrix={})
except pycmMatrixError as e:
    print(str(e))

Invalid input confusion matrix format.

try:
    cm_4 = ConfusionMatrix(matrix={1: {1: 2, "1": 2}, "1": {1: 2, "1": 3}})
except pycmMatrixError as e:
    print(str(e))

All input matrix classes must be of the same type.

try:
    cm_5 = ConfusionMatrix(matrix={1: {1: 2}})
except pycmMatrixError as e:
    print(str(e))

The number of classes must be at least 2.

try:
    cp = Compare([cm2, cm3])
except pycmCompareError as e:
    print(str(e))

Input must be provided as a dictionary.

try:
    cp = Compare({"cm1": cm, "cm2": cm2})
except pycmCompareError as e:
    print(str(e))

All ConfusionMatrix objects must have the same domain (same sample size and number of classes).

try:
    cp = Compare({"cm1": [], "cm2": cm2})
except pycmCompareError as e:
    print(str(e))

Input must be a dictionary containing pycm.ConfusionMatrix objects.

try:
    cp = Compare({"cm2": cm2})
except pycmCompareError as e:
    print(str(e))

At least 2 confusion matrices are required for comparison.

try:
    cp = Compare(
        {"cm1": cm2, "cm2": cm3},
        by_class=True,
        class_weight={1: 2, 2: 0})
except pycmCompareError as e:
    print(str(e))

`class_weight` must be a dictionary and specified for all classes.

try:
    cp = Compare(
        {"cm1":cm2, "cm2":cm3},
        class_benchmark_weight={1: 2, 2: 0})
except pycmCompareError as e:
    print(str(e))

`class_benchmark_weight` must be a dictionary and specified for all class benchmarks.

try:
    cp = Compare(
        {"cm1": cm2, "cm2": cm3},
        overall_benchmark_weight={1: 2, 2: 0})
except pycmCompareError as e:
    print(str(e))

`overall_benchmark_weight` must be a dictionary and specified for all overall benchmarks.

try:
    cm.CI("MCC")
except pycmCIError as e:
    print(str(e))

Confidence interval calculation for this parameter is not supported in this version of pycm.
 Supported parameters are: TPR, TNR, PPV, NPV, ACC, PLR, NLR, FPR, FNR, AUC, PRE, Kappa, Overall ACC

try:
    cm.CI(2)
except pycmCIError as e:
    print(str(e))

Input must be provided as a string.

try:
    cm.average("AXY")
except pycmAverageError as e:
    print(str(e))

Invalid parameter!

try:
    cm.weighted_average("AXY")
except pycmAverageError as e:
    print(str(e))

Invalid parameter!

try:
    cm.weighted_average("AUC", weight={1: 22})
except pycmAverageError as e:
    print(str(e))

`weight` must be a dictionary and specified for all classes.

try:
    cm.position()
except pycmVectorError as e:
    print(str(e))

This option is only available in vector mode.

try:
    cm.combine(2)
except pycmMatrixError as e:
    print(str(e))

Input must be an instance of pycm.ConfusionMatrix.

try:
    cm6 = ConfusionMatrix([1, 1, 1, 1], [1, 1, 1, 1], classes=[])
except pycmMatrixError as e:
    print(str(e))

The number of classes must be at least 2.

try:
    cm7 = ConfusionMatrix([1, 1, 1, 1], [1, 1, 1, 1], classes=[1, 1, 2])
except pycmVectorError as e:
    print(str(e))

`classes` must contain unique labels with no duplicates.

try:
    crv = ROCCurve([1, 2, 2, 1], {1, 2, 2, 1}, classes=[1, 2])
except pycmCurveError as e:
    print(str(e))

Input vectors must be provided as a list or a NumPy array.

try:
    crv = ROCCurve([1, 2, 2, 1],[[0.1, 0.9]], classes=[1, 2])
except pycmCurveError as e:
    print(str(e))

Input vectors must have the same length.

try:
    crv = ROCCurve(
        [1, 2, 2, 1],
        [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.2, 0.9]],
        classes=[1, 2])
except pycmCurveError as e:
    print(str(e))

The sum of the probability values must equal 1.

try:
    crv = ROCCurve(
        [1, 2, 2, 1],
        [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.1, 0.9]],
        classes={1, 2})
except pycmCurveError as e:
    print(str(e))

`classes` must be provided as a list.

try:
    crv = ROCCurve(
        [1, 2, 2, 1],
        [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.1, 0.9]],
        classes=[1, 2, 3])
except pycmCurveError as e:
    print(str(e))

`classes` does not match the actual vector.

try:
    crv = ROCCurve(
        [1, 1, 1, 1],
        [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.1, 0.9]],
        classes=[1])
except pycmCurveError as e:
    print(str(e))

The number of classes must be at least 2.

try:
    crv = ROCCurve(
        [1, 2, 2, 1],
        [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.2, "s"]],
        classes=[1, 2])
except pycmCurveError as e:
    print(str(e))

Probability vector elements must be numeric.

try:
    crv = ROCCurve(
        [1, 2, 2, 1],
        [[0.1, 0.9],[0.1, 0.9], [0.1, 0.9], [0.2, 0.8]],
        classes=[1, 2],
        thresholds={1, 2})
except pycmCurveError as e:
    print(str(e))

`thresholds` must be provided as a list or a NumPy array.

try:
    crv = ROCCurve(
        [1, 2, 2, 1],
        [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.2, 0.8]],
        classes=[1, 2],
        thresholds=[0.1])
except pycmCurveError as e:
    print(str(e))

The number of thresholds must be at least 2.

try:
    crv = ROCCurve(
        [1, 2, 2, 1],
        [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.2, 0.8]],
        classes=[1, 2],
        thresholds=[0.1, "q"])
except pycmCurveError as e:
    print(str(e))

`thresholds` must contain only numeric values.

try:
    crv = ROCCurve(
        [1, 2, 2, 1],
        [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.2, 0.8]],
        classes=[1, 1, 2])
except pycmCurveError as e:
    print(str(e))

`classes` must contain unique labels with no duplicates.

try:
    crv = ROCCurve(
        [1, 2, 2, 1],
        [[0.1, 0.9], [0.1, 0.9], [0.1, 0.8, 0.1], [0.2, 0.8]],
        classes=[1, 2])
except pycmCurveError as e:
    print(str(e))

All elements of the probability vector must have the same length and match the number of classes.

try:
    crv = ROCCurve(
        actual_vector=numpy.array([1, 1, 2, 2]),
        probs=numpy.array([[0.1, 0.9], [0.4, 0.6], [0.35, 0.65], [0.8, 0.2]]),
        classes=[2, 1])
    crv.area(method="trpz")
except pycmCurveError as e:
    print(str(e))

The integral method must be either 'trapezoidal' or 'midpoint'.

try:
    mlcm = MultiLabelCM([[0, 1], [1, 1]], [[1, 0], [1, 0]])
except pycmVectorError as e:
    print(str(e))

Failed to extract classes from input. Input vectors should be a list of sets with unified types.

try:
    mlcm = MultiLabelCM([{'dog'}, {'cat', 'dog'}], [{'cat'}, {'cat'}])
    mlcm.get_cm_by_class(1)
except pycmMultiLabelError as e:
    print(str(e))

The specified class name is not among the confusion matrix's classes.

try:
    mlcm.get_cm_by_sample(2)
except pycmMultiLabelError as e:
    print(str(e))

Index is out of range for the given vector.

try:
    mlcm = MultiLabelCM(2, [{1, 0}, {1, 0}])
except pycmVectorError as e:
    print(str(e))

Input vectors must be provided as a list or a NumPy array.

try:
    mlcm = MultiLabelCM([{1, 0}, {1, 0}, {1,1}], [{1, 0}, {1, 0}])
except pycmVectorError as e:
    print(str(e))

Input vectors must have the same length.

try:
    mlcm = MultiLabelCM([], [])
except pycmVectorError as e:
    print(str(e))

Input vectors must not be empty.

try:
    mlcm = MultiLabelCM([{1, 0}, {1, 0}], [{1, 0}, {1, 0}], classes=[1,0,1])
except pycmVectorError as e:
    print(str(e))

`classes` must contain unique labels with no duplicates.

Notice : updated in version 4.0

Examples¶

Example-1 (Comparison of three different classifiers)¶

Example-2 (How to plot via matplotlib)¶

Example-3 (Activation threshold)¶

Example-4 (File)¶

Example-5 (Sample weights)¶

Example-6 (Unbalanced data)¶

Example-7 (How to plot via seaborn+pandas)¶

Example-8 (Confidence interval)¶

Cite¶

If you use PyCM in your research, we would appreciate citations to the following paper :

Haghighi, S., Jasemi, M., Hessabi, S. and Zolanvari, A. (2018). PyCM: Multiclass confusion matrix library in Python.
Journal of Open Source Software, 3(25), p.729.

@article{Haghighi2018,
  doi = {10.21105/joss.00729},
  url = {https://doi.org/10.21105/joss.00729},
  year  = {2018},
  month = {may},
  publisher = {The Open Journal},
  volume = {3},
  number = {25},
  pages = {729},
  author = {Sepand Haghighi and Masoomeh Jasemi and Shaahin Hessabi and Alireza Zolanvari},
  title = {{PyCM}: Multiclass confusion matrix library in Python},
  journal = {Journal of Open Source Software}
}

Download PyCM.bib

References¶

1- J. R. Landis and G. G. Koch, "The measurement of observer agreement for categorical data," biometrics, pp. 159-174, 1977.

2- D. M. Powers, "Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation," arXiv preprint arXiv:2010.16061, 2020.

3- C. Sammut and G. I. Webb, Encyclopedia of machine learning. Springer Science & Business Media, 2011.

4- J. L. Fleiss, "Measuring nominal scale agreement among many raters," Psychological bulletin, vol. 76, no. 5, p. 378, 1971.

5- D. G. Altman, Practical statistics for medical research. CRC press, 1990.

6- K. L. Gwet, "Computing inter-rater reliability and its variance in the presence of high agreement," British Journal of Mathematical and Statistical Psychology, vol. 61, no. 1, pp. 29-48, 2008.

7- W. A. Scott, "Reliability of content analysis: The case of nominal scale coding," Public opinion quarterly, pp. 321-325, 1955.

8- E. M. Bennett, R. Alpert, and A. Goldstein, "Communications through limited-response questioning," Public Opinion Quarterly, vol. 18, no. 3, pp. 303-308, 1954.

9- D. V. Cicchetti, "Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology," Psychological assessment, vol. 6, no. 4, p. 284, 1994.

10- R. B. Davies, "Algorithm AS 155: The distribution of a linear combination of χ2 random variables," Applied Statistics, pp. 323-333, 1980.

11- S. Kullback and R. A. Leibler, "On information and sufficiency," The annals of mathematical statistics, vol. 22, no. 1, pp. 79-86, 1951.

12- L. A. Goodman and W. H. Kruskal, "Measures of association for cross classifications, IV: Simplification of asymptotic variances," Journal of the American Statistical Association, vol. 67, no. 338, pp. 415-421, 1972.

13- L. A. Goodman and W. H. Kruskal, "Measures of association for cross classifications III: Approximate sampling theory," Journal of the American Statistical Association, vol. 58, no. 302, pp. 310-364, 1963.

14- T. Byrt, J. Bishop, and J. B. Carlin, "Bias, prevalence and kappa," Journal of clinical epidemiology, vol. 46, no. 5, pp. 423-429, 1993.

15- M. Shepperd, D. Bowes, and T. Hall, "Researcher bias: The use of machine learning in software defect prediction," IEEE Transactions on Software Engineering, vol. 40, no. 6, pp. 603-616, 2014.

16- X. Deng, Q. Liu, Y. Deng, and S. Mahadevan, "An improved method to construct basic probability assignment based on the confusion matrix for classification problem," Information Sciences, vol. 340, pp. 250-261, 2016.

17- J.-M. Wei, X.-J. Yuan, Q.-H. Hu, and S.-Q. Wang, "A novel measure for evaluating classifiers," Expert Systems with Applications, vol. 37, no. 5, pp. 3799-3809, 2010.

18- I. Kononenko and I. Bratko, "Information-based evaluation criterion for classifier's performance," Machine learning, vol. 6, no. 1, pp. 67-80, 1991.

19- R. Delgado and J. D. Núnez-González, "Enhancing confusion entropy as measure for evaluating classifiers," in The 13th International Conference on Soft Computing Models in Industrial and Environmental Applications, 2018: Springer, pp. 79-89.

20- J. Gorodkin, "Comparing two K-category assignments by a K-category correlation coefficient," Computational biology and chemistry, vol.28, no. 5-6, pp. 367-374, 2004.

21- C. O. Freitas, J. M. De Carvalho, J. Oliveira, S. B. Aires, and R. Sabourin, "Confusion matrix disagreement for multiple classifiers," in Iberoamerican Congress on Pattern Recognition, 2007: Springer, pp. 387-396.

22- P. Branco, L. Torgo, and R. P. Ribeiro, "Relevance-based evaluation metrics for multi-class imbalanced domains," in Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2017: Springer, pp. 698-710.

23- D. Ballabio, F. Grisoni, and R. Todeschini, "Multivariate comparison of classification performance measures," Chemometrics and Intelligent Laboratory Systems, vol. 174, pp. 33-44, 2018.

24- J. Cohen, "A coefficient of agreement for nominal scales," Educational and psychological measurement, vol. 20, no. 1, pp. 37-46, 1960.

25- S. Siegel, "Nonparametric statistics for the behavioral sciences," 1956.

26- H. Cramér, Mathematical methods of statistics. Princeton university press, 1999.

27- B. W. Matthews, "Comparison of the predicted and observed secondary structure of T4 phage lysozyme," Biochimica et Biophysica Acta (BBA)-Protein Structure, vol. 405, no. 2, pp. 442-451, 1975.

28- J. A. Swets, "The relative operating characteristic in psychology: a technique for isolating effects of response bias finds wide use in the study of perception and cognition," Science, vol. 182, no. 4116, pp. 990-1000, 1973.

29- P. Jaccard, "Étude comparative de la distribution florale dans une portion des Alpes et des Jura," Bull Soc Vaudoise Sci Nat, vol. 37, pp. 547-579, 1901.

30- T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley & Sons, 2012.

31- E. S. Keeping, Introduction to statistical inference. Courier Corporation, 1995.

32- V. Sindhwani, P. Bhattacharya, and S. Rakshit, "Information theoretic feature crediting in multiclass support vector machines," in Proceedings of the 2001 SIAM International Conference on Data Mining, 2001: SIAM, pp. 1-18.

33- M. Bekkar, H. K. Djemaa, and T. A. Alitouche, "Evaluation measures for models assessment over imbalanced data sets," J Inf Eng Appl, vol. 3, no. 10, 2013.

34- W. J. Youden, "Index for rating diagnostic tests," Cancer, vol. 3, no. 1, pp. 32-35, 1950.

35- S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, "Dynamic itemset counting and implication rules for market basket data," in Proceedings of the 1997 ACM SIGMOD international conference on Management of data, 1997, pp. 255-264.

36- S. Raschka, "MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack," Journal of open source software, vol. 3, no. 24, p. 638, 2018.

37- J. R. Bray and J. T. Curtis, "An ordination of the upland forest communities of southern Wisconsin," Ecological monographs, vol. 27, no. 4, pp. 325-349, 1957.

38- J. L. Fleiss, J. Cohen, and B. S. Everitt, "Large sample standard errors of kappa and weighted kappa," Psychological bulletin, vol. 72, no. 5, p. 323, 1969.

39- M. Felkin, "Comparing classification results between n-ary and binary problems," in Quality Measures in Data Mining: Springer, 2007, pp. 277-301.

40- R. Ranawana and V. Palade, "Optimized precision-a new measure for classifier performance evaluation," in 2006 IEEE International Conference on Evolutionary Computation, 2006: IEEE, pp. 2254-2261.

41- V. García, R. A. Mollineda, and J. S. Sánchez, "Index of balanced accuracy: A performance measure for skewed class distributions," in Iberian conference on pattern recognition and image analysis, 2009: Springer, pp. 441-448.

42- P. Branco, L. Torgo, and R. P. Ribeiro, "A survey of predictive modeling on imbalanced domains," ACM Computing Surveys (CSUR), vol. 49, no. 2, pp. 1-50, 2016.

43- K. Pearson, "Notes on Regression and Inheritance in the Case of Two Parents," in Proceedings of the Royal Society of London, p. 240-242, 1895.

44- W. J. Conover, Practical nonparametric statistics. John Wiley & Sons, 1998.

45- G. U. Yule, "On the methods of measuring association between two attributes," Journal of the Royal Statistical Society, vol. 75, no. 6, pp. 579-652, 1912.

46- R. Batuwita and V. Palade, "A new performance measure for class imbalance learning. application to bioinformatics problems," in 2009 International Conference on Machine Learning and Applications, 2009: IEEE, pp. 545-550.

47- D. K. Lee, "Alternatives to P value: confidence interval and effect size," Korean journal of anesthesiology, vol. 69, no. 6, p. 555, 2016.

48- M. A. Raslich, R. J. Markert, and S. A. Stutes, "Selecting and interpreting diagnostic tests," Biochemia Medica, vol. 17, no. 2, pp. 151-161, 2007.

49- D. E. Hinkle, W. Wiersma, and S. G. Jurs, Applied statistics for the behavioral sciences. Houghton Mifflin College Division, 2003.

50- A. Maratea, A. Petrosino, and M. Manzo, "Adjusted F-measure and kernel scaling for imbalanced data learning," Information Sciences, vol. 257, pp. 331-341, 2014.

51- L. Mosley, "A balanced approach to the multi-class imbalance problem," 2013.

52- M. Vijaymeena and K. Kavitha, "A survey on similarity measures in text mining," Machine Learning and Applications: An International Journal, vol. 3, no. 2, pp. 19-28, 2016.

53- Y. Otsuka, "The faunal character of the Japanese Pleistocene marine Mollusca, as evidence of climate having become colder during the Pleistocene in Japan," Biogeograph Soc Japan, vol. 6, no. 16, pp. 165-170, 1936.

54- A. Tversky, "Features of similarity," Psychological review, vol. 84, no. 4, p. 327, 1977.

55- K. Boyd, K. H. Eng, and C. D. Page, "Area under the precision-recall curve: point estimates and confidence intervals," in Joint European conference on machine learning and knowledge discovery in databases, 2013: Springer, pp. 451-466.

56- J. Davis and M. Goadrich, "The relationship between Precision-Recall and ROC curves," in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 233-240.

57- M. Kuhn, "Building predictive models in R using the caret package," J Stat Softw, vol. 28, no. 5, pp. 1-26, 2008.

58- V. Labatut and H. Cherifi, "Accuracy measures for the comparison of classifiers," arXiv preprint arXiv:1207.3790, 2012.

59- S. Wallis, "Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods," Journal of Quantitative Linguistics, vol. 20, no. 3, pp. 178-208, 2013.

60- D. Altman, D. Machin, T. Bryant, and M. Gardner, Statistics with confidence: confidence intervals and statistical guidelines. John Wiley & Sons, 2013.

61- J. A. Hanley and B. J. McNeil, "The meaning and use of the area under a receiver operating characteristic (ROC) curve," Radiology, vol. 143, no. 1, pp. 29-36, 1982.

62- E. B. Wilson, "Probable inference, the law of succession, and statistical inference," Journal of the American Statistical Association, vol. 22, no. 158, pp. 209-212, 1927.

63- A. Agresti and B. A. Coull, "Approximate is better than “exact” for interval estimation of binomial proportions," The American Statistician, vol. 52, no. 2, pp. 119-126, 1998.

64- C. S. Peirce, "The numerical measure of the success of predictions," Science, no. 93, pp. 453-454, 1884.

65- E. W. Steyerberg, B. Van Calster, and M. J. Pencina, "Performance measures for prediction models and markers: evaluation of predictions and classifications," Revista Española de Cardiología (English Edition), vol. 64, no. 9, pp. 788-794, 2011.

66- A. J. Vickers and E. B. Elkin, "Decision curve analysis: a novel method for evaluating prediction models," Medical Decision Making, vol. 26, no. 6, pp. 565-574, 2006.

67- G. W. Bohrnstedt and D. Knoke,"Statistics for social data analysis," 1982.

68- W. M. Rand, "Objective criteria for the evaluation of clustering methods," Journal of the American Statistical association, vol. 66, no. 336, pp. 846-850, 1971.

69- J. M. Santos and M. Embrechts, "On the use of the adjusted rand index as a metric for evaluating supervised classification," in International conference on artificial neural networks, 2009: Springer, pp. 175-184.

70- J. Cohen, "Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit," Psychological bulletin, vol. 70, no. 4, p. 213, 1968.

71- R. Bakeman and J. M. Gottman, Observing interaction: An introduction to sequential analysis. Cambridge university press, 1997.

72- S. Bangdiwala, "A graphical test for observer agreement," in 45th International Statistical Institute Meeting, 1985, vol. 1985, p. 307.

73- K. Bangdiwala and H. Bryan, "Using SAS software graphical procedures for the observer agreement chart," in Proceedings of the SAS Users Group International Conference, 1987, vol. 12, pp. 1083-1088.

74- A. F. Hayes and K. Krippendorff, "Answering the call for a standard reliability measure for coding data," Communication methods and measures, vol. 1, no. 1, pp. 77-89, 2007.

75- M. Aickin, "Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen's kappa," Biometrics, pp. 293-302, 1990.

76- N. A. Macmillan and C. D. Creelman, Detectiontheory: A user's guide. Psychology press, 2004.

77- D. J. Hand, P. Christen, and N. Kirielle, "F*: an interpretable transformation of the F-measure," Machine Learning, vol. 110, no. 3, pp. 451-456, 2021.

78- G. W. Brier, "Verification of forecasts expressed in terms of probability," Monthly weather review, vol. 78, no. 1, pp. 1-3, 1950.

79- L. Buitinck et al., "API design for machine learning software: experiences from the scikit-learn project," arXiv preprint arXiv:1309.0238, 2013.

80- R. W. Hamming, "Error detecting and error correcting codes," The Bell system technical journal, vol. 29, no. 2, pp. 147-160, 1950.

81- S. S. Choi, S. H. Cha, and C. C. Tappert, "A survey of binary similarity and distance measures," Journal of systemics, cybernetics and informatics, vol. 8, no. 1, pp. 43-48, 2010.

82- J. Braun-Blanquet, "Plant sociology. The study of plant communities," Plant sociology. The study of plant communities. First ed., 1932.

83- C. C. Little, "Abydos Documentation," 2020.

84- K. Villela, A. Silva, T. Vale, and E. S. de Almeida, "A survey on software variability management approaches," in Proceedings of the 18th International Software Product Line Conference-Volume 1, 2014, pp. 147-156.

85- J. R. Saura, A. Reyes-Menendez, and P. Palos-Sanchez, "Are black Friday deals worth it? Mining Twitter users’ sentiment and behavior response," Journal of Open Innovation: Technology, Market, and Complexity, vol. 5, no. 3, p. 58, 2019.

86- P. Schubert and U. Leimstoll, "Importance and use of information technology in small and medium‐sized companies," Electronic Markets, vol. 17, no. 1, pp. 38-55, 2007.

87- G. Weyenberg and R. Yoshida, "Reconstructing the phylogeny: Computational methods," in Algebraic and Discrete Mathematical Methods for Modern Biology. Academic Press, pp. 293-319, 2015.