sklearn:sklearn.feature_selection的SelectFromModel函数的简介、使用方法之详细攻略
sklearn:sklearn.feature_selection的SelectFromModel函数的简介、使用方法之详细攻略
目录
1、使用SelectFromModel和LassoCV进行特征选择
3、Tree-based feature selection
SelectFromModel函数的简介
SelectFromModel is a meta-transformer that can be used along with any estimator that has a coef_ or feature_importances_ attribute after fitting. The features are considered unimportant and removed, if the corresponding coef_ or feature_importances_ values are below the provided threshold parameter. Apart from specifying the threshold numerically, there are built-in heuristics for finding a threshold using a string argument. Available heuristics are “mean”, “median” and float multiples of these like “0.1*mean”.
SelectFromModel是一个元转换器,可以与任何在拟合后具有coef_或feature_importances_属性的estimator 一起使用。如果相应的coef_或feature_importances_值低于提供的阈值参数,则认为这些特性不重要并将其删除。除了以数字方式指定阈值外,还有使用字符串参数查找阈值的内置启发式方法。可用的试探法是“平均数”、“中位数”和这些数的浮点倍数,如“0.1*平均数”。
"""Meta-transformer for selecting features based on importance weights. .. versionadded:: 0.17 | 用于根据重要性权重来选择特征的元转换器。 . .加入在0.17版本:: |
Parameters ---------- estimator : object The base estimator from which the transformer is built. This can be both a fitted (if ``prefit`` is set to True) or a non-fitted estimator. The estimator must have either a ``feature_importances_`` or ``coef_`` attribute after fitting. threshold : string, float, optional default None The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If "median" (resp. "mean"), then the ``threshold`` value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., "1.25*mean") may also be used. If None and if the estimator has a parameter penalty set to l1, either explicitly or implicitly (e.g, Lasso), the threshold used is 1e-5. Otherwise, "mean" is used by default. prefit : bool, default False Whether a prefit model is expected to be passed into the constructor directly or not. If True, ``transform`` must be called directly and SelectFromModel cannot be used with ``cross_val_score``, ``GridSearchCV`` and similar utilities that clone the estimator. Otherwise train the model using ``fit`` and then ``transform`` to do feature selection. norm_order : non-zero int, inf, -inf, default 1 Order of the norm used to filter the vectors of coefficients below ``threshold`` in the case where the ``coef_`` attribute of the estimator is of dimension 2. | 参数
用于特征选择的阈值。重要性大于或等于的特征被保留,其他特征被丢弃。如果“中位数”(分别地。(“均值”),则“阈值”为中位数(resp,特征重要性的平均值)。也可以使用比例因子(例如“1.25*平均值”)。如果没有,并且估计量有一个参数惩罚设置为l1,不管是显式的还是隐式的(例如Lasso),阈值为1e-5。否则,默认使用“mean”。
prefit模型是否应直接传递给构造函数。如果为True,则必须直接调用“transform”,SelectFromModel不能与cross_val_score 、GridSearchCV以及类似的克隆估计器的实用程序一起使用。否则,使用' ' fit ' '和' ' transform ' '训练模型进行特征选择。
|
Attributes ---------- estimator_ : an estimator The base estimator from which the transformer is built. This is stored only when a non-fitted estimator is passed to the ``SelectFromModel``, i.e when prefit is False. threshold_ : float The threshold value used for feature selection. """ | 属性 建立转换器的基estimator,只有在将非拟合估计量传递给SelectFromModel 时,才会存储它。当prefit 为假时。
threshold_ :浮点类型 |
1、使用SelectFromModel和LassoCV进行特征选择
# Author: Manoj Kumar <mks542@nyu.edu>
# License: BSD 3 clause
print(__doc__)
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_boston
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LassoCV
# Load the boston dataset.
X, y = load_boston(return_X_y=True)
# We use the base estimator LassoCV since the L1 norm promotes sparsity of features.
clf = LassoCV()
# Set a minimum threshold of 0.25
sfm = SelectFromModel(clf, threshold=0.25)
sfm.fit(X, y)
n_features = sfm.transform(X).shape[1]
# Reset the threshold till the number of features equals two.
# Note that the attribute can be set directly instead of repeatedly
# fitting the metatransformer.
while n_features > 2:
sfm.threshold += 0.1
X_transform = sfm.transform(X)
n_features = X_transform.shape[1]
# Plot the selected two features from X.
plt.title(
"Features selected from Boston using SelectFromModel with "
"threshold %0.3f." % sfm.threshold)
feature1 = X_transform[:, 0]
feature2 = X_transform[:, 1]
plt.plot(feature1, feature2, 'r.')
plt.xlabel("Feature number 1")
plt.ylabel("Feature number 2")
plt.ylim([np.min(feature2), np.max(feature2)])
plt.show()
2、L1-based feature selection
>>> from sklearn.svm import LinearSVC
>>> from sklearn.datasets import load_iris
>>> from sklearn.feature_selection import SelectFromModel
>>> X, y = load_iris(return_X_y=True)
>>> X.shape
(150, 4)
>>> lsvc = LinearSVC(C=0.01, penalty="l1", dual=False).fit(X, y)
>>> model = SelectFromModel(lsvc, prefit=True)
>>> X_new = model.transform(X)
>>> X_new.shape
(150, 3)
3、Tree-based feature selection
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.datasets import load_iris
>>> from sklearn.feature_selection import SelectFromModel
>>> X, y = load_iris(return_X_y=True)
>>> X.shape
(150, 4)
>>> clf = ExtraTreesClassifier(n_estimators=50)
>>> clf = clf.fit(X, y)
>>> clf.feature_importances_
array([ 0.04..., 0.05..., 0.4..., 0.4...])
>>> model = SelectFromModel(clf, prefit=True)
>>> X_new = model.transform(X)
>>> X_new.shape
(150, 2)
SelectFromModel函数的使用方法
1、SelectFromModel的原生代码
class SelectFromModel Found at: sklearn.feature_selection.from_model
class SelectFromModel(BaseEstimator, SelectorMixin, MetaEstimatorMixin):
"""Meta-transformer for selecting features based on importance weights.
.. versionadded:: 0.17
Parameters
----------
estimator : object
The base estimator from which the transformer is built.
This can be both a fitted (if ``prefit`` is set to True)
or a non-fitted estimator. The estimator must have either a
``feature_importances_`` or ``coef_`` attribute after fitting.
threshold : string, float, optional default None
The threshold value to use for feature selection. Features whose
importance is greater or equal are kept while the others are
discarded. If "median" (resp. "mean"), then the ``threshold`` value is
the median (resp. the mean) of the feature importances. A scaling
factor (e.g., "1.25*mean") may also be used. If None and if the
estimator has a parameter penalty set to l1, either explicitly
or implicitly (e.g, Lasso), the threshold used is 1e-5.
Otherwise, "mean" is used by default.
prefit : bool, default False
Whether a prefit model is expected to be passed into the constructor
directly or not. If True, ``transform`` must be called directly
and SelectFromModel cannot be used with ``cross_val_score``,
``GridSearchCV`` and similar utilities that clone the estimator.
Otherwise train the model using ``fit`` and then ``transform`` to do
feature selection.
norm_order : non-zero int, inf, -inf, default 1
Order of the norm used to filter the vectors of coefficients below
``threshold`` in the case where the ``coef_`` attribute of the
estimator is of dimension 2.
Attributes
----------
estimator_ : an estimator
The base estimator from which the transformer is built.
This is stored only when a non-fitted estimator is passed to the
``SelectFromModel``, i.e when prefit is False.
threshold_ : float
The threshold value used for feature selection.
"""
def __init__(self, estimator, threshold=None, prefit=False,
norm_order=1):
self.estimator = estimator
self.threshold = threshold
self.prefit = prefit
self.norm_order = norm_order
def _get_support_mask(self):
# SelectFromModel can directly call on transform.
if self.prefit:
estimator = self.estimator
elif hasattr(self, 'estimator_'):
estimator = self.estimator_
else:
raise ValueError(
'Either fit SelectFromModel before transform or set "prefit='
'True" and pass a fitted estimator to the constructor.')
scores = _get_feature_importances(estimator, self.norm_order)
threshold = _calculate_threshold(estimator, scores, self.threshold)
return scores >= threshold
def fit(self, X, y=None, **fit_params):
"""Fit the SelectFromModel meta-transformer.
Parameters
----------
X : array-like of shape (n_samples, n_features)
The training input samples.
y : array-like, shape (n_samples,)
The target values (integers that correspond to classes in
classification, real numbers in regression).
**fit_params : Other estimator specific parameters
Returns
-------
self : object
Returns self.
"""
if self.prefit:
raise NotFittedError(
"Since 'prefit=True', call transform directly")
self.estimator_ = clone(self.estimator)
self.estimator_.fit(X, y, **fit_params)
return self
@property
def threshold_(self):
scores = _get_feature_importances(self.estimator_, self.norm_order)
return _calculate_threshold(self.estimator, scores, self.threshold)
@if_delegate_has_method('estimator')
def partial_fit(self, X, y=None, **fit_params):
"""Fit the SelectFromModel meta-transformer only once.
Parameters
----------
X : array-like of shape (n_samples, n_features)
The training input samples.
y : array-like, shape (n_samples,)
The target values (integers that correspond to classes in
classification, real numbers in regression).
**fit_params : Other estimator specific parameters
Returns
-------
self : object
Returns self.
"""
if self.prefit:
raise NotFittedError(
"Since 'prefit=True', call transform directly")
if not hasattr(self, "estimator_"):
self.estimator_ = clone(self.estimator)
self.estimator_.partial_fit(X, y, **fit_params)
return self
相关文章
- 获得内核函数地址的四种方法
- Python中针对函数处理的特殊方法
- python使用threading获取线程函数返回值的实现方法
- python禁止函数修改列表的实现方法
- python禁止函数修改列表的实现方法
- Ipython:Ipython中Magic Functions魔法函数的简介、使用方法(单百分号%如%matplotlib inline、双百分号%%如%%!)之详细攻略
- Database之SQL:SQL之over partition by开窗函数的简介、使用方法(求各班级内各自排名/求各班级内第一名/求各班级内分数递增和等案例解析)之详细攻略
- Py之cv2:cv2(OpenCV,opencv-python)库的简介、安装、使用方法(常见函数、图像基本运算等)最强详细攻略
- Python之Pandas:pandas.read_csv()函数的简介、具体案例、使用方法详细攻略
- Python语言学习:Python语言学习之正则表达式常用函数之re.search方法【输出仅一个匹配结果(内容+位置)】、re.findall方法【输出所有匹配结果(内容)】案例集合之详细攻略
- Python编程语言学习:python语言中快速查询python自带模块&函数的用法及其属性方法、如何查询某个函数&关键词的用法、输出一个类或者实例化对象的所有属性和方法名之详细攻略
- Python编程学习:让函数更加灵活的*args和**kwargs(设计不同数量参数的函数)的简介、使用方法、经典案例之详细攻略
- Py之matplotlib.pyplot:matplotlib.pyplot的plt.legend函数的简介、使用方法之详细攻略
- Python编程语言学习:sklearn.manifold的TSNE函数的简介、使用方法、代码实现之详细攻略
- Python之pandas:pandas中数据处理常用函数(与空值相关/去重和替代)简介、具体案例、使用方法之详细攻略
- Python语言学习之字母D开头函数使用集锦:del、dict使用方法之详细攻略
- C++中sort函数从大到小排序的两种方法
- vue的iview列表table render函数设置DOM属性值的方法
- 什么是函数、什么是方法/什么后面可以加括号?
- JavaScript学习总结-技巧、有用函数、简洁方法、编程细节
- C++ 排序函数 sort(),qsort()的使用方法
- 一天学完spark的Scala基础语法教程四、方法与函数(idea版本)