您现在的位置是：首页 > 其他

当前栏目

ML之sklearn：sklearn.metrics中confusion_matrix函数、make_scorer函数解读、案例应用之详细攻略

案例应用函数详细解读攻略 ML Make

2023-09-14 09:04:46 时间

sklearn.metrics中常用的函数参数

sklearn.metrics.confusion_matrix函数

函数解释

sklearn.metrics.make_scorer()函数

函数的解读

函数案例应用

结合log_transfer函数使用

sklearn.metrics中常用的函数参数

sklearn.metrics.confusion_matrix函数

函数解释

返回值：混淆矩阵，其第i行和第j列条目表示真实标签为第i类、预测标签为第j类的样本数。

预测
0 1
真实 0
1

def confusion_matrix Found at: sklearn.metrics._classification

@_deprecate_positional_args
def confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None,  normalize=None):
"""Compute confusion matrix to evaluate the accuracy of a classification.

By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}` is equal to the number of observations known to be in group :math:`i` and predicted to be in group :math:`j`.

Thus in binary classification, the count of true negatives is
:math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is
:math:`C_{1,1}` and false positives is :math:`C_{0,1}`.

Read more in the :ref:`User Guide <confusion_matrix>`.

Parameters
----------
y_true : array-like of shape (n_samples,) Ground truth (correct) target values.
y_pred : array-like of shape (n_samples,) Estimated targets as returned by a classifier.
labels : array-like of shape (n_classes), default=None.  List of labels to index the matrix. This may be used to reorder
or select a subset of labels.  If ``None`` is given, those that appear at least once in ``y_true`` or ``y_pred`` are used in sorted order.

sample_weight : array-like of shape (n_samples,), default=None. Sample weights.
.. versionadded:: 0.18

normalize : {'true', 'pred', 'all'}, default=None. Normalizes confusion matrix over the true (rows), predicted (columns)
conditions or all the population. If None, confusion matrix will not be normalized.

Returns
-------
C : ndarray of shape (n_classes, n_classes)
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with true label being i-th class and prediced label being j-th class.

References
----------
.. [1] `Wikipedia entry for the Confusion matrix <https://en.wikipedia.org/wiki/Confusion_matrix>`_  (Wikipedia and other references may use a different convention for axes)

在:sklear. metrics._classification找到的def confusion_matrix

@_deprecate_positional_args
defconfusion_matrix (y_true, y_pred， *， label =None, sample_weight=None， normalize= None):
计算混淆矩阵来评估分类的准确性。

根据定义，一个混淆矩阵:math: ' C '是这样的:math: ' C_{i, j} '等于已知在:math: ' i '组和预测在:math: ' j '组的观测数。

因此，在二元分类法中，true negatives的数量是
:math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is
:math:`C_{1,1}` and false positives is :math:`C_{0,1}`.

更多信息见:ref: ' User Guide <confusion_matrix> '。</confusion_matrix>

参数
----------
y_true:类数组形状(n_samples，) Ground truth (correct)目标值。
y_pred:分类器返回的估计目标的类数组形状(n_samples，)。
labels :类数组形状(n_classes)，默认=无。索引矩阵的标签列表。这可以用于重新排序
或者选择标签的子集。如果给出了' ' None ' '，则在' ' y_true ' '或' ' y_pred ' '中至少出现一次的值将按排序顺序使用。

sample_weight:类似数组的形状(n_samples，)，默认=None。样本权重。
. .versionadded:: 0.18

normalize ：{'true'， 'pred'， 'all'}， default=None。对真实(行)、预测(列)的混淆矩阵进行规范化
条件或所有的人口。如果没有，混淆矩阵将不会被标准化。

Returns
-------
C:形状的ndarray (n_classes, n_classes)
第i行和第j列项表示真标签样本个数为第i类，谓词标签样本个数为第j类的混淆矩阵。

References
----------
. .[1] '用于混淆矩阵的维基百科条目<https: en.wikipedia.org="" wiki="" confusion_matrix=""> ' _(维基百科和其他引用可能对轴使用不同的约定)</https:>

Examples
--------
>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
[0, 0, 1],
[1, 0, 2]])

>>> y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
>>> y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
>>> confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])
array([[2, 0, 0],
[0, 0, 1],
[1, 0, 2]])

In the binary case, we can extract true positives, etc as follows:

>>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
>>> (tn, fp, fn, tp)
(0, 2, 1, 1)

"""
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
if y_type not in ("binary", "multiclass"):
raise ValueError("%s is not supported" % y_type)
if labels is None:
labels = unique_labels(y_true, y_pred)
else:
labels = np.asarray(labels)
n_labels = labels.size
if n_labels == 0:
raise ValueError("'labels' should contains at least one label.")
elif y_true.size == 0:
return np.zeros((n_labels, n_labels), dtype=np.int)
elif np.all([l not in y_true for l in labels]):
raise ValueError("At least one label specified must be in y_true")
if sample_weight is None:
sample_weight = np.ones(y_true.shape[0], dtype=np.int64)
else:
sample_weight = np.asarray(sample_weight)
check_consistent_length(y_true, y_pred, sample_weight)
if normalize not in ['true', 'pred', 'all', None]:
raise ValueError("normalize must be one of {'true', 'pred', "
"'all', None}")
n_labels = labels.size
label_to_ind = {y:x for x, y in enumerate(labels)}
# convert yt, yp into index
y_pred = np.array([label_to_ind.get(x, n_labels + 1) for x in y_pred])
y_true = np.array([label_to_ind.get(x, n_labels + 1) for x in y_true])
# intersect y_pred, y_true with labels, eliminate items not in labels
ind = np.logical_and(y_pred < n_labels, y_true < n_labels)
y_pred = y_pred[ind]
y_true = y_true[ind] # also eliminate weights of eliminated items
sample_weight = sample_weight[ind]
# Choose the accumulator dtype to always have high precision
if sample_weight.dtype.kind in {'i', 'u', 'b'}:
dtype = np.int64
else:
dtype = np.float64
cm = coo_matrix((sample_weight, (y_true, y_pred)), shape=(n_labels,
n_labels), dtype=dtype).toarray()
with np.errstate(all='ignore'):
if normalize == 'true':
cm = cm / cm.sum(axis=1, keepdims=True)
elif normalize == 'pred':
cm = cm / cm.sum(axis=0, keepdims=True)
elif normalize == 'all':
cm = cm / cm.sum()
cm = np.nan_to_num(cm)
return cm

sklearn.metrics.make_scorer()函数

函数的解读

def make_scorer(score_func, *, greater_is_better=True, needs_proba=False, needs_threshold=False, **kwargs):
"""Make a scorer from a performance metric or loss function.  This factory function wraps scoring functions for use in
:class:`~sklearn.model_selection.GridSearchCV` and
:func:`~sklearn.model_selection.cross_val_score`.  It takes a score function, such as :func:`~sklearn.metrics.accuracy_score`,
:func:`~sklearn.metrics.mean_squared_error`,
:func:`~sklearn.metrics.adjusted_rand_index` or
:func:`~sklearn.metrics.average_precision` and returns a callable that scores an estimator's output.
The signature of the call is `(estimator, X, y)` where `estimator`  is the model to be evaluated, `X` is the data and `y` is the  ground truth labeling (or `None` in the case of unsupervised models).

Read more in the :ref:`User Guide <scoring>`.

根据性能指标或损失函数制作记分员。这个工厂函数包装了评分函数，用于:class:`~sklearn.model_selection。GridSearchCV '和:func: ' ~sklearn.model_selection.cross_val_score '。它接受一个评分函数，例如:func:`~sklearn.metrics。accuracy_score ~ sklearn.metrics, func:。mean_squared_error ~ sklearn.metrics, func:。adjuststed_rand_index '或:func: ' ~sklearn.metrics。并返回一个可调用对象，对估计器的输出进行评分。调用的签名是' (estimator, X, y) '，其中' estimator '是要评估的模型，' X '是数据，' y '是基本真理标记(在无监督模型的情况下为' None ')。阅读更多:参考:“用户指南”。

Parameters
----------
score_func : callable. Score function (or loss function) with signature
``score_func(y, y_pred, **kwargs)``.

greater_is_better : bool, default=True
Whether score_func is a score function (default), meaning high is good, or a loss function, meaning low is good. In the latter case, the scorer object will sign-flip the outcome of the score_func.

needs_proba : bool, default=False. Whether score_func requires predict_proba to get probability estimates out of a classifier. If True, for binary `y_true`, the score function is supposed to accept a 1D `y_pred` (i.e., probability of the positive class, shape `(n_samples,)`).

needs_threshold : bool, default=False. Whether score_func takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. If True, for binary `y_true`, the score function is supposed to accept a 1D `y_pred` (i.e., probability of the positive class or the decision function, shape `(n_samples,)`).

For example ``average_precision`` or the area under the roc curve can not be computed using discrete predictions alone.

**kwargs : additional arguments. Additional parameters to be passed to score_func.

Returns
-------
scorer : callable. Callable object that returns a scalar score; greater is better.

参数

----------

score_func:可调用的。分数函数(或损失函数)带有签名“score_func(y, y_pred， **kwargs)”。greater_is_better: bool, default=True score_func是一个得分函数(默认)，表示高是好，还是一个损失函数，表示低是好。在后一种情况下，scorer对象将对score_func的结果进行符号翻转。needs_proba: bool, default=False。score_func是否需要predict_proba从分类器中获得概率估计。如果为True，对于二进制' y_true '， score函数应该接受1D ' y_pred '(即，正类的概率，形状' (n_samples，) ')。

needs_threshold : bool, default=False。score_func是否具有连续的决策确定性。这只适用于使用具有decision_function或predict_proba方法的估计器进行二进制分类。如果为True，对于二进制' y_true '，得分函数应该接受1D ' y_pred '(即，正类或决策函数shape ' (n_samples，) '的概率)。例如，“average_precision”或roc曲线下的面积不能单独使用离散预测来计算。**kwargs:附加参数。要传递给score_func的附加参数。

------

scorer :可调用。返回标量分数的可调用对象;越大越好。

Examples
--------
>>> from sklearn.metrics import fbeta_score, make_scorer
>>> ftwo_scorer = make_scorer(fbeta_score, beta=2)
>>> ftwo_scorer
make_scorer(fbeta_score, beta=2)
>>> from sklearn.model_selection import GridSearchCV
>>> from sklearn.svm import LinearSVC
>>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]},
... scoring=ftwo_scorer)

Notes
-----
If `needs_proba=False` and `needs_threshold=False`, the score function is supposed to accept the output of :term:`predict`. If `needs_proba=True`, the score function is supposed to accept the output of :term:`predict_proba` (For binary `y_true`, the score function is supposed to accept probability of the positive class). If `needs_threshold=True`, the score function is supposed to accept the output of :term:`decision_function`.
"""
sign = 1 if greater_is_better else -1
if needs_proba and needs_threshold:
raise ValueError("Set either needs_proba or needs_threshold to True,"
" but not both.")
if needs_proba:
cls = _ProbaScorer
elif needs_threshold:
cls = _ThresholdScorer
else:
cls = _PredictScorer
return cls(score_func, sign, kwargs)

函数案例应用

结合log_transfer函数使用

from sklearn.metrics import make_scorer,mean_absolute_error,r2_score
def log_transfer(func):
    def wrapper(y, y_hat):
        result = func(np.log(y), np.nan_to_num(np.log(y_hat)))
        return result
    return wrapper
cv_scores = cross_val_score(LiR_Model, X=X_train, y=y_train, verbose=1, cv=5, scoring=make_scorer(log_transfer(r2_score)))

猜你喜欢

条条大道通金融，拼多多会走哪一条？
前端之CSS介绍–选择器详解编程语言
炉石传说有多少张卡牌
Oracle 视图 V$GG_APPLY_RECEIVER 官方解释，作用，如何使用详细说明
压缩mssql数据库的简单方法分享（如何压缩mssql数据库）
Scalaz（22）－泛函编程思维： Coerce Monadic Thinking详解编程语言
找到能做好研发效能的人
MongoDB：上传下载实战经验（mongodb上传下载）
网友分享人类“自私”行为没素质做法看完拳头硬了
财务Oracle改善CK财务状况（oracle 修改 ck）
BitComet 1.45 正式版发布下载
Mycat与Mssql的数据迁移实践（mycat mssql）
postgresql 计算时间差的秒数、天数实例
基于stm32的智能小车(远程控制、避障、循迹)
比拼微软云与阿里云的Redis服务能力（微软云阿里云redis）

相关主题

python爬虫案例
面向对象案例
爬虫小案例
案例准备
GC案例
调优案例1
Mybatis案例
25 帧动画案例
redis 锁的案例
购物车案例
MyBatis—案例

zl程序教程

当前栏目

ML之sklearn：sklearn.metrics中confusion_matrix函数、make_scorer函数解读、案例应用之详细攻略

sklearn.metrics中常用的函数参数

sklearn.metrics.confusion_matrix函数

函数解释

sklearn.metrics.make_scorer()函数

函数的解读

函数案例应用

结合log_transfer函数使用

相关文章