Abstract
Background Feature selection is important in high dimensional data analysis. The wrapper approach is one of the ways to perform feature selection, but it is computationally intensive as it builds and evaluates models of multiple subsets of features. The existing wrapper approaches primarily focus on shortening the path to find an optimal feature set. However, these approaches underutilize the capability of feature subset models, which impacts feature selection and its predictive performance.
Method and Results This study proposes a novel Artificial Intelligence infused wrapper based Feature Selection (AIFS), a new feature selection method that integrates artificial intelligence with wrapper based feature selection. The approach creates a Performance Prediction Model (PPM) using artificial intelligence (AI) which predicts the performance of any feature set and allows wrapper based methods to predict and evaluate the feature subset model performance without building actual model. The algorithm can make wrapper based method more relevant for high-dimensional data and is flexible to be applicable in any wrapper based method. We evaluate the performance of this algorithm using simulated studies and real research studies. AIFS shows better or at par feature selection and model prediction performance than standard penalized feature selection algorithms like LASSO and sparse partial least squares.
Conclusion AIFS approach provides an alternative method to the existing approaches for feature selection. The current study focuses on AIFS application in continuous cross-sectional data. However, it could be applied to other datasets like longitudinal, categorical and time-to-event biological data.
Competing Interest Statement
The authors have declared no competing interest.
List of abbreviations
- AEnet
- Adaptive Elastic Net
- AI
- Artificial Intelligence
- AIFS
- Artificial Intelligence infused wrapper based Feature Selection
- ALASSO
- Adaptive LASSO
- AUC
- Area Under Curve
- CHSI
- Community Health Status Indicators
- Enet
- Elastic Net
- GLASSO
- Group LASSO
- NSHAP
- National Social Life, Health and Aging Project
- OOB
- Out Of the Bag
- PPM
- Performance Prediction Model
- RMSE
- Root Mean Square Error
- SPLS
- Sparse Partial Least Squares
- StW
- Standard Wrapper
- SWAN
- Study of Women’s Health Across the Nation