slickml.metrics#

Package Contents#

Classes#

BinaryClassificationMetrics

BinaryClassificationMetrics calculates binary classification metrics in one place.

RegressionMetrics

Regression Metrics is a wrapper to calculate all the regression metrics in one place.

class slickml.metrics.BinaryClassificationMetrics[source]#

BinaryClassificationMetrics calculates binary classification metrics in one place.

Binary metrics are computed based on three methods for calculating the thresholds to binarize the prediction probabilities. Threshold computations including:

  1. Youden Index [youden-j-index].

  2. Maximizing Precision-Recall.

  3. Maximizing Sensitivity-Specificity.

Parameters:
  • y_true (Union[List[int], np.ndarray, pd.Series]) – List of ground truth values such as [0, 1] for binary problems

  • y_pred_proba (Union[List[float], np.ndarray, pd.Series]) – List of predicted probabilities for the positive class (class=1) in binary problems or y_pred_proba[:, 1] in scikit-learn API

  • threshold (float, optional) – Inclusive threshold value to binarize y_pred_prob to y_pred where any value that satisfies y_pred_prob >= threshold will set to class=1 (positive class). Note that for ">=" is used instead of ">", by default 0.5

  • average_method (str, optional) – Method to calculate the average of any metric. Possible values are "micro", "macro", "weighted", "binary", by default “binary”

  • precision_digits (int, optional) – The number of precision digits to format the scores dataframe, by default 3

  • display_df (bool, optional) – Whether to display the formatted scores’ dataframe, by default True

plot(figsize=(12, 12), save_path=None, display_plot=False, return_fig=False)[source]#

Plots classification metrics

get_metrics(dtype='dataframe')[source]#

Returns calculated classification metrics

y_pred_#

Predicted class based on the threshold. The threshold value inclusively binarizes y_pred_prob to y_pred where any value that satisfies y_pred_prob >= threshold will set to class=1 (positive class). Note that for ">=" is used instead of ">"

Type:

np.ndarray

accuracy_#

Accuracy based on the initial threshold value with a possible value between 0.0 and 1.0

Type:

float

balanced_accuracy_#

Balanced accuracy based on the initial threshold value considering the prevalence of the classes with a possible value between 0.0 and 1.0

Type:

float

fpr_list_#

List of calculated false-positive-rates based on roc_thresholds_

Type:

np.ndarray

tpr_list_#

List of calculated true-positive-rates based on roc_thresholds_

Type:

np.ndarray

roc_thresholds_#

List of thresholds value to calculate fpr_list_ and tpr_list_

Type:

np.ndarray

auc_roc_#

Area under ROC curve with a possible value between 0.0 and 1.0

Type:

float

precision_list_#

List of calculated precision based on pr_thresholds_

Type:

np.ndarray

recall_list_#

List of calculated recall based on pr_thresholds_

Type:

np.ndarray

pr_thresholds_#

List of precision-recall thresholds value to calculate precision_list_ and recall_list_

Type:

numpy.ndarray

auc_pr_#

Area under Precision-Recall curve with a possible value between 0.0 and 1.0

Type:

float

precision_#

Precision based on the threshold value with a possible value between 0.0 and 1.0

Type:

float

recall_#

Recall based on the threshold value with a possible value between 0.0 and 1.0

Type:

float

f1_#

F1-score based on the threshold value (beta=1.0) with a possible value between 0.0 and 1.0

Type:

float

f2_#

F2-score based on the threshold value (beta=2.0) with a possible value between 0.0 and 1.0

Type:

float

f05_#

F(1/2)-score based on the threshold value (beta=0.5) with a possible value between 0.0 and 1.0

Type:

float

average_precision_#

Avearge precision based on the threshold value and class prevalence with a possible value between 0.0 and 1.0

Type:

float

tn_#

True negative counts based on the threshold value

Type:

np.int64

fp_#

False positive counts based on the threshold valuee

Type:

np.int64

fn_#

False negative counts based on the threshold value

Type:

np.int64

tp_#

True positive counts based on the threshold value

Type:

np.int64

threat_score_#

Threat score based on the threshold value with a possible value between 0.0 and 1.0

Type:

float

youden_index_#

Index of the calculated Youden index threshold

Type:

np.int64

youden_threshold_#

Threshold calculated based on Youden Index with a possible value between 0.0 and 1.0

Type:

float

sens_spec_threshold_#

Threshold calculated based on maximized sensitivity-specificity with a possible value between 0.0 and 1.0

Type:

float

prec_rec_threshold_#

Threshold calculated based on maximized precision-recall with a possible value between 0.0 and 1.0

Type:

float

thresholds_dict_#

Calculated thresholds based on different algorithms including Youden Index youden_threshold_, maximizing the area under sensitivity-specificity curve sens_spec_threshold_, and maximizing the area under precision-recall curver prec_rec_threshold_

Type:

Dict[str, float]

metrics_dict_#

Rounded metrics based on the number of precision digits

Type:

Dict[str, float]

metrics_df_#

Pandas DataFrame of all calculated metrics with threshold set as index

Type:

pd.DataFrame

average_methods_#

List of all possible average methods

Type:

List[str]

plotting_dict_#

Plotting properties

Type:

Dict[str, Any]

References

Examples

>>> from slickml.metrics import BinaryClassificationMetrics
>>> cm = BinaryClassificationMetrics(
...     y_true=[1, 1, 0, 0],
...     y_pred_proba=[0.95, 0.3, 0.1, 0.9]
... )
>>> f = cm.plot()
>>> m = cm.get_metrics()
average_method :Optional[str] = binary#
display_df :Optional[bool] = True#
precision_digits :Optional[int] = 3#
threshold :Optional[float] = 0.5#
y_pred_proba :Union[List[float], numpy.ndarray, pandas.Series]#
y_true :Union[List[int], numpy.ndarray, pandas.Series]#
__post_init__() None[source]#

Post instantiation validations and assignments.

get_metrics(dtype: Optional[str] = 'dataframe') Union[pandas.DataFrame, Dict[str, Optional[float]]][source]#

Returns calculated metrics with desired dtypes.

Currently, available output types are “dataframe” and “dict”.

Parameters:

dtype (str, optional) – Results dtype, by default “dataframe”

Returns:

Union[pd.DataFrame, Dict[str, Optional[float]]]

plot(figsize: Optional[Tuple[float, float]] = (12, 12), save_path: Optional[str] = None, display_plot: Optional[bool] = False, return_fig: Optional[bool] = False) Optional[matplotlib.figure.Figure][source]#

Plots classification metrics.

Parameters:
  • figsize (Tuple[float, float], optional) – Figure size, by default (12, 12)

  • save_path (str, optional) – The full or relative path to save the plot including the image format such as “myplot.png” or “../../myplot.pdf”, by default None

  • display_plot (bool, optional) – Whether to show the plot, by default False

  • return_fig (bool, optional) – Whether to return figure object, by default False

Returns:

Figure

class slickml.metrics.RegressionMetrics[source]#

Regression Metrics is a wrapper to calculate all the regression metrics in one place.

Notes

In case of multioutput regression, calculation methods can be chosen among "raw_values", "uniform_average", and "variance_weighted".

Parameters:
  • y_true (Union[List[float], np.ndarray, pd.Series]) – Ground truth target (response) values

  • y_pred (Union[List[float], np.ndarray, pd.Series]) – Predicted target (response) values

  • multioutput (str, optional) – Method to calculate the metric for multioutput targets where possible values are "raw_values", "uniform_average", and "variance_weighted". "raw_values" returns a full set of scores in case of multioutput input. "uniform_average" scores of all outputs are averaged with uniform weight. "variance_weighted" scores of all outputs are averaged, weighted by the variances of each individual output, by default “uniform_average”

  • precision_digits (int, optional) – The number of precision digits to format the scores dataframe, by default 3

  • display_df (bool, optional) – Whether to display the formatted scores’ dataframe, by default True

plot(figsize=(12, 16), save_path=None, display_plot=False, return_fig=False)[source]#

Plots regression metrics

get_metrics(dtype='dataframe')[source]#

Returns calculated metrics

y_residual_#

Residual values (errors) calculated as (y_true - y_pred)

Type:

np.ndarray

y_residual_normsq_#

Square root of absolute value of y_residual_

Type:

np.ndarray

r2_#

\(R^2\) score (coefficient of determination) with a possible value between 0.0 and 1.0

Type:

float

ev_#

Explained variance score with a possible value between 0.0 and 1.0

Type:

float

mae_#

Mean absolute error

Type:

float

mse_#

Mean squared error

Type:

float

msle_#

Mean squared log error

Type:

float

mape_#

Mean absolute percentage error

Type:

float

auc_rec_#

Area under REC curve with a possible value between 0.0 and 1.0

Type:

float

deviation_#

Arranged deviations to plot REC curve

Type:

np.ndarray

accuracy_#

Calculated accuracy at each deviation to plot REC curve

Type:

np.ndarray

y_ratio_#

Ratio of y_pred/y_true

Type:

np.ndarray

mean_y_ratio_#

Mean value of y_pred/y_true ratio

Type:

float

std_y_ratio_#

Standard deviation value of y_pred/y_true ratio

Type:

float

cv_y_ratio_#

Coefficient of variation calculated as std_y_ratio/mean_y_ratio

Type:

float

metrics_dict_#

Rounded metrics based on the number of precision digits

Type:

Dict[str, Optional[float]]

metrics_df_#

Pandas DataFrame of all calculated metrics

Type:

pd.DataFrame

plotting_dict_#

Plotting properties

Type:

Dict[str, Any]

References

[Tahmassebi-et-al]

Tahmassebi, A., Gandomi, A. H., & Meyer-Baese, A. (2018, July). A Pareto front based evolutionary model for airfoil self-noise prediction. In 2018 IEEE Congress on Evolutionary Computation (CEC) (pp. 1-8). IEEE. https://www.amirhessam.com/assets/pdf/projects/cec-airfoil2018.pdf

[rec-curve]

Bi, J., & Bennett, K. P. (2003). Regression error characteristic curves. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 43-50). https://www.aaai.org/Papers/ICML/2003/ICML03-009.pdf

Examples

>>> from slickml.metrics import RegressionMetrics
>>> rm = RegressionMetrics(
...     y_true=[3, -0.5, 2, 7],
...     y_pred=[2.5, 0.0, 2, 8]
... )
>>> m = rm.get_metrics()
>>> rm.plot()
display_df :Optional[bool] = True#
multioutput :Optional[str] = uniform_average#
precision_digits :Optional[int] = 3#
y_pred :Union[List[float], numpy.ndarray, pandas.Series]#
y_true :Union[List[float], numpy.ndarray, pandas.Series]#
__post_init__() None[source]#

Post instantiation validations and assignments.

get_metrics(dtype: Optional[str] = 'dataframe') Union[pandas.DataFrame, Dict[str, Optional[float]]][source]#

Returns calculated metrics with desired dtypes.

Currently, available output types are "dataframe" and "dict".

Parameters:

dtype (str, optional) – Results dtype, by default “dataframe”

Returns:

Union[pd.DataFrame, Dict[str, Optional[float]]]

plot(figsize: Optional[Tuple[float, float]] = (12, 16), save_path: Optional[str] = None, display_plot: Optional[bool] = False, return_fig: Optional[bool] = False) Optional[matplotlib.figure.Figure][source]#

Plots regression metrics.

Parameters:
  • figsize (Tuple[float, float], optional) – Figure size, by default (12, 16)

  • save_path (str, optional) – The full or relative path to save the plot including the image format such as “myplot.png” or “../../myplot.pdf”, by default None

  • display_plot (bool, optional) – Whether to show the plot, by default False

  • return_fig (bool, optional) – Whether to return figure object, by default False

Returns:

Figure, optional