Feature selection is the process of selecting a subset of appropriate features from the original ones based on certain criteria. Feature selection techniques include information gain, gain ratio, fisher score, etc. . Depending on whether the training set is labelled or not, feature selection techniques can be categorized into three types, namely, supervised, unsupervised and semi-supervised.
- Supervised feature selection uses the labeled data to evaluate features. This labeled data is given by external knowledge that may be unreliable or mislabeled that intensify the possibility of over-fitting the learning process of a classifier by selecting irrelevant features   .
- Unsupervised feature selection is more challenging than supervised feature selection because in this the data is unlabeled. It is unbiased as it does not require experts to label the data. The main drawback of this technique is that it might neglect the correlation between attributes and does not find out the optimal feature set for classification .
- Semi-supervised feature selection is that in which the majority of data is labeled but not the complete dataset .
The above discussed three methods for feature selection are used to develop feature evaluation models. Feature evaluation models are broadly categorized into three types, namely, filter models, wrapper models, and embedded models .
- Filter Models: These models evaluate features without utilizing any classification algorithms. These models observe the inherent properties of data to find out the correlation of features. Advantages of these models are they are highly scalable, computationally fast and straightforward. Examples of feature selection approach based on filter models are Chi-Square, Information gain, Pearson’s Correlation and Gain Ratio  .
- Wrapper Models: These models utilize a predefined mining classification algorithm. The advantage of using these models is that those features are selected by the models which are most suitable for already known algorithm and due to this reason the classifier gives good results. Drawbacks of these models are (a) They are computationally expensive, takes more computational time if the number of variables are more and risk of over-fitting increases when the number of observations is less. Examples of wrapper model algorithms are swarm optimization, support vector machine, and genetic algorithm  .
- Embedded Models: These models select features as part of the learning process of classifier construction. These models are precise to a specified learning algorithm. Examples of embedded models are Weighted Naive Bayes, and Decision Trees .
- P. Ghosh, C. Debnath, D. Metia, and R. Dutta, “An Efficient Hybrid Multilevel intrusion Detection System in Cloud Environment”, IOSR Journal of Computer Engineering (IOSR-JCE), vol. 16, no. 1, pp. 16-26, 2014.
- L.P. Rajeshwari, and K. Arputharaj, “An Active Rule Approach for Network Intrusion Detection with Enhanced C4.5 Algorithm”, International Journal of Communications, Network and System Sciences, vol. 1, no. 4, pp. 314-321, 2008.
- B. Hssina, A. Merbouha, H. Ezzikkouri, and M. Erritali, “A Comparative Study of Decision Tree ID3 and C4.5”, International Journal of Advanced Computer Science and Applications, vol. 4, no. 2, 2014.
- X.Cheng, and S.Wen, “A Real-Time Hybrid Intrusion Detection System based on Principle Component Analysis and Self Organizing Maps”, In Sixth International Conference on Natural Computation, vol. 3, pp. 1182-1185, Aug 2010.
- Z. Pawlak, “Rough Set Theory and its Applications to Data Analysis”, Cybernetics and Systems, vol. 29, no. 7, pp. 661-688, 1998.
- S. Bahl, and S.K. Sharma, “A Minimal Subset of Features using Correlation Feature Selection Model for Intrusion Detection System”, In proceedings of the Second International Conference on Computer and Communication Technologies, Springer, pp. 337-346, 2016.
- I. Guyon, and A. Elisseeff, “An Introduction to Variable and Feature Selection”, Journal of Machine Learning Research, vol. 3, pp. 1157-1182, Mar 2003.
- M. Dash, and H. Liu, “Consistency-based Search in Feature Selection”, Artificial Intelligence, vol. 151, no. 1, pp. 155-176, 2003.
- I. Monedero, F. Biscarri, C. Leon, J.I. Guerrero, J. Biscarri, and R. Millan, “Detection of Frauds and other Non-technical Losses in a Power Utility using Pearson Coefficient, Bayesian Networks and Decision Trees”, International Journal of Electrical Power and Energy Systems, vol. 34, no. 1, pp. 90-98, 2012.
- Z. Chen, and S. Zhang, “An Approach to Network Misuse Detection based on Extension Matrix and Genetic Algorithm”, In IEEE 5th International Conference on Cognitive Informatics, vol. 1, pp. 107-113, 2006.