zl程序教程

您现在的位置是:首页 >  其他

当前栏目

ML之sklearn:sklearn.metrics中confusion_matrix函数、make_scorer函数解读、案例应用之详细攻略

案例应用 函数 详细 解读 攻略 ML Make
2023-09-14 09:04:46 时间

ML之sklearn:sklearn.metrics中confusion_matrix函数、make_scorer函数解读、案例应用之详细攻略

目录

sklearn.metrics中常用的函数参数

sklearn.metrics.confusion_matrix函数

函数解释

sklearn.metrics.make_scorer()函数

函数的解读

函数案例应用

结合log_transfer函数使用


推荐文章
ML:分类预测问题中评价指标(ER/混淆矩阵P-R-F1/ROC-AUC/RP/mAP)简介、使用方法、代码实现、案例应用之详细攻略
CNN之性能指标:卷积神经网络中常用的性能指标(IOU/AP/mAP、混淆矩阵)简介、使用方法之详细攻略

sklearn.metrics中常用的函数参数

sklearn.metrics.confusion_matrix函数

函数解释

返回值:混淆矩阵,其第i行和第j列条目表示真实标签为第i类、预测标签为第j类的样本数。

                                         预测
                   0                    1
真实    0
    1

def confusion_matrix Found at: sklearn.metrics._classification

@_deprecate_positional_args
def confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None,  normalize=None):
    """Compute confusion matrix to evaluate the accuracy of a classification.
    
    By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}` is equal to the number of observations known to be in group :math:`i` and predicted to be in group :math:`j`.
    
    Thus in binary classification, the count of true negatives is
    :math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is
    :math:`C_{1,1}` and false positives is :math:`C_{0,1}`.
    
    Read more in the :ref:`User Guide <confusion_matrix>`.
    
    Parameters
    ----------
    y_true : array-like of shape (n_samples,) Ground truth (correct) target values.
    y_pred : array-like of shape (n_samples,) Estimated targets as returned by a classifier.
    labels : array-like of shape (n_classes), default=None.  List of labels to index the matrix. This may be used to reorder
    or select a subset of labels.  If ``None`` is given, those that appear at least once in ``y_true`` or ``y_pred`` are used in sorted order.
    
    sample_weight : array-like of shape (n_samples,), default=None. Sample weights.
 .. versionadded:: 0.18
    
    normalize : {'true', 'pred', 'all'}, default=None. Normalizes confusion matrix over the true (rows), predicted (columns)
    conditions or all the population. If None, confusion matrix will not be normalized.
    
    Returns
    -------
    C : ndarray of shape (n_classes, n_classes)
    Confusion matrix whose i-th row and j-th column entry indicates the number of samples with true label being i-th class and prediced label being j-th class.
    
    References
    ----------
    .. [1] `Wikipedia entry for the Confusion matrix <https://en.wikipedia.org/wiki/Confusion_matrix>`_  (Wikipedia and other references may use a different convention for axes)

在:sklear. metrics._classification找到的def confusion_matrix

@_deprecate_positional_args
defconfusion_matrix (y_true, y_pred, *, label =None, sample_weight=None, normalize= None):
计算混淆矩阵来评估分类的准确性

根据定义,一个混淆矩阵:math: ' C '是这样的:math: ' C_{i, j} '等于已知在:math: ' i '组和预测在:math: ' j '组的观测数。

因此,在二元分类法中,true negatives的数量是
    :math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is
    :math:`C_{1,1}` and false positives is :math:`C_{0,1}`.

更多信息见:ref: ' User Guide <confusion_matrix> '。</confusion_matrix>

参数
----------
y_true:类数组形状(n_samples,) Ground truth (correct)目标值。
y_pred:分类器返回的估计目标的类数组形状(n_samples,)。
labels :类数组形状(n_classes),默认=无。索引矩阵的标签列表。这可以用于重新排序
或者选择标签的子集。如果给出了' ' None ' ',则在' ' y_true ' '或' ' y_pred ' '中至少出现一次的值将按排序顺序使用。

sample_weight:类似数组的形状(n_samples,),默认=None。样本权重。
. .versionadded:: 0.18


normalize :{'true', 'pred', 'all'}, default=None。对真实(行)、预测(列)的混淆矩阵进行规范化
条件或所有的人口。如果没有,混淆矩阵将不会被标准化。

Returns
-------
C:形状的ndarray (n_classes, n_classes)
第i行和第j列项表示真标签样本个数为第i类,谓词标签样本个数为第j类的混淆矩阵。

References
----------
. .[1] '用于混淆矩阵的维基百科条目<https: en.wikipedia.org="" wiki="" confusion_matrix=""> ' _(维基百科和其他引用可能对轴使用不同的约定)</https:>

  Examples
    --------
    >>> from sklearn.metrics import confusion_matrix
    >>> y_true = [2, 0, 2, 2, 0, 1]
    >>> y_pred = [0, 0, 2, 2, 0, 2]
    >>> confusion_matrix(y_true, y_pred)
    array([[2, 0, 0],
    [0, 0, 1],
    [1, 0, 2]])
    
    >>> y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
    >>> y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
    >>> confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])
    array([[2, 0, 0],
    [0, 0, 1],
    [1, 0, 2]])
    
    In the binary case, we can extract true positives, etc as follows:
    
    >>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
    >>> (tn, fp, fn, tp)
    (0, 2, 1, 1)
    """
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    if y_type not in ("binary", "multiclass"):
        raise ValueError("%s is not supported" % y_type)
    if labels is None:
        labels = unique_labels(y_true, y_pred)
    else:
        labels = np.asarray(labels)
        n_labels = labels.size
        if n_labels == 0:
            raise ValueError("'labels' should contains at least one label.")
        elif y_true.size == 0:
            return np.zeros((n_labels, n_labels), dtype=np.int)
        elif np.all([l not in y_true for l in labels]):
            raise ValueError("At least one label specified must be in y_true")
    if sample_weight is None:
        sample_weight = np.ones(y_true.shape[0], dtype=np.int64)
    else:
        sample_weight = np.asarray(sample_weight)
    check_consistent_length(y_true, y_pred, sample_weight)
    if normalize not in ['true', 'pred', 'all', None]:
        raise ValueError("normalize must be one of {'true', 'pred', "
            "'all', None}")
    n_labels = labels.size
    label_to_ind = {y:x for x, y in enumerate(labels)}
    # convert yt, yp into index
    y_pred = np.array([label_to_ind.get(x, n_labels + 1) for x in y_pred])
    y_true = np.array([label_to_ind.get(x, n_labels + 1) for x in y_true])
    # intersect y_pred, y_true with labels, eliminate items not in labels
    ind = np.logical_and(y_pred < n_labels, y_true < n_labels)
    y_pred = y_pred[ind]
    y_true = y_true[ind] # also eliminate weights of eliminated items
    sample_weight = sample_weight[ind]
    # Choose the accumulator dtype to always have high precision
    if sample_weight.dtype.kind in {'i', 'u', 'b'}:
        dtype = np.int64
    else:
        dtype = np.float64
    cm = coo_matrix((sample_weight, (y_true, y_pred)), shape=(n_labels, 
     n_labels), dtype=dtype).toarray()
    with np.errstate(all='ignore'):
        if normalize == 'true':
            cm = cm / cm.sum(axis=1, keepdims=True)
        elif normalize == 'pred':
            cm = cm / cm.sum(axis=0, keepdims=True)
        elif normalize == 'all':
            cm = cm / cm.sum()
        cm = np.nan_to_num(cm)
    return cm

sklearn.metrics.make_scorer()函数

函数的解读

def make_scorer(score_func, *, greater_is_better=True, needs_proba=False, needs_threshold=False, **kwargs):
    """Make a scorer from a performance metric or loss function.  This factory function wraps scoring functions for use in
    :class:`~sklearn.model_selection.GridSearchCV` and
    :func:`~sklearn.model_selection.cross_val_score`.  It takes a score function, such as :func:`~sklearn.metrics.accuracy_score`,
    :func:`~sklearn.metrics.mean_squared_error`,
    :func:`~sklearn.metrics.adjusted_rand_index` or
    :func:`~sklearn.metrics.average_precision` and returns a callable that scores an estimator's output.
    The signature of the call is `(estimator, X, y)` where `estimator`  is the model to be evaluated, `X` is the data and `y` is the  ground truth labeling (or `None` in the case of unsupervised models).

    Read more in the :ref:`User Guide <scoring>`.

根据性能指标或损失函数制作记分员。这个工厂函数包装了评分函数,用于:class:`~sklearn.model_selection。GridSearchCV '和:func: ' ~sklearn.model_selection.cross_val_score '。它接受一个评分函数,例如:func:`~sklearn.metrics。accuracy_score ~ sklearn.metrics, func:。mean_squared_error ~ sklearn.metrics, func:。adjuststed_rand_index '或:func: ' ~sklearn.metrics。并返回一个可调用对象,对估计器的输出进行评分。调用的签名是' (estimator, X, y) ',其中' estimator '是要评估的模型,' X '是数据,' y '是基本真理标记(在无监督模型的情况下为' None ')。阅读更多:参考:“用户指南”。

    Parameters
    ----------
    score_func : callable. Score function (or loss function) with signature
        ``score_func(y, y_pred, **kwargs)``.

    greater_is_better : bool, default=True
        Whether score_func is a score function (default), meaning high is good, or a loss function, meaning low is good. In the latter case, the scorer object will sign-flip the outcome of the score_func.

    needs_proba : bool, default=False. Whether score_func requires predict_proba to get probability estimates  out of a classifier. If True, for binary `y_true`, the score function is supposed to accept  a 1D `y_pred` (i.e., probability of the positive class, shape  `(n_samples,)`).

    needs_threshold : bool, default=False. Whether score_func takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. If True, for binary `y_true`, the score function is supposed to accept a 1D `y_pred` (i.e., probability of the positive class or the decision  function, shape `(n_samples,)`).

        For example ``average_precision`` or the area under the roc curve can not be computed using discrete predictions alone.

    **kwargs : additional arguments.  Additional parameters to be passed to score_func.

    Returns
    -------
    scorer : callable. Callable object that returns a scalar score; greater is better.

参数

----------

score_func:可调用的。分数函数(或损失函数)带有签名“score_func(y, y_pred, **kwargs)”。greater_is_better: bool, default=True score_func是一个得分函数(默认),表示高是好,还是一个损失函数,表示低是好。在后一种情况下,scorer对象将对score_func的结果进行符号翻转。needs_proba: bool, default=False。score_func是否需要predict_proba从分类器中获得概率估计。如果为True,对于二进制' y_true ', score函数应该接受1D ' y_pred '(即,正类的概率,形状' (n_samples,) ')。

needs_threshold : bool, default=False。score_func是否具有连续的决策确定性。这只适用于使用具有decision_function或predict_proba方法的估计器进行二进制分类。如果为True,对于二进制' y_true ',得分函数应该接受1D ' y_pred '(即,正类或决策函数shape ' (n_samples,) '的概率)。例如,“average_precision”或roc曲线下的面积不能单独使用离散预测来计算。**kwargs:附加参数。要传递给score_func的附加参数。

返回

------

scorer :可调用。返回标量分数的可调用对象;越大越好。

    Examples
    --------
    >>> from sklearn.metrics import fbeta_score, make_scorer
    >>> ftwo_scorer = make_scorer(fbeta_score, beta=2)
    >>> ftwo_scorer
    make_scorer(fbeta_score, beta=2)
    >>> from sklearn.model_selection import GridSearchCV
    >>> from sklearn.svm import LinearSVC
    >>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]},
    ...                     scoring=ftwo_scorer)

    Notes
    -----
    If `needs_proba=False` and `needs_threshold=False`, the score  function is supposed to accept the output of :term:`predict`. If `needs_proba=True`, the score function is supposed to accept the output of :term:`predict_proba` (For binary `y_true`, the score function is supposed to accept probability of the positive class). If `needs_threshold=True`, the score function is supposed to accept the  output of :term:`decision_function`.
    """
    sign = 1 if greater_is_better else -1
    if needs_proba and needs_threshold:
        raise ValueError("Set either needs_proba or needs_threshold to True,"
                         " but not both.")
    if needs_proba:
        cls = _ProbaScorer
    elif needs_threshold:
        cls = _ThresholdScorer
    else:
        cls = _PredictScorer
    return cls(score_func, sign, kwargs)

函数案例应用

结合log_transfer函数使用

from sklearn.metrics import make_scorer,mean_absolute_error,r2_score
def log_transfer(func):
    def wrapper(y, y_hat):
        result = func(np.log(y), np.nan_to_num(np.log(y_hat)))
        return result
    return wrapper
cv_scores = cross_val_score(LiR_Model, X=X_train, y=y_train, verbose=1, cv=5, scoring=make_scorer(log_transfer(r2_score)))