Feature Selection in Machine Learning

Mohapatra Satya
2 min readMay 17, 2021

--

Hi, In this lesson, you will discover how to select the most important features in a dataset.

Feature selection is the process of reducing the number of input variables when developing a predictive model.

It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model.

Recursive Feature Elimination, or RFE for short, is a popular feature selection algorithm.

RFE is popular because it is easy to configure and use and because it is effective at selecting those features (columns) in a training dataset that are more or most relevant in predicting the target variable.

The scikit-learn Python machine learning library provides an implementation of RFE for machine learning. RFE is a transform. To use it, first, the class is configured with the chosen algorithm specified via the estimator argument and the number of features to select via the n_features_to_select argument.

The example below defines a synthetic classification dataset with five redundant input features. RFE is then used to select five features using the decision tree algorithm.

# report which features were selected by RFE

from sklearn.datasets import make_classification

from sklearn.feature_selection import RFE

from sklearn.tree import DecisionTreeClassifier

# define dataset

X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1)

# define RFE

rfe = RFE(estimator=DecisionTreeClassifier(), n_features_to_select=5)

# fit RFE

rfe.fit(X, y)

# summarize all features

for i in range(X.shape[1]):

print(‘Column: %d, Selected=%s, Rank: %d’ % (i, rfe.support_[i], rfe.ranking_[i]))

--

--