A new model selection criterion for partial least squares regression

 

José L. Martínez, Helton Saulo, Humberto Barrios Escobar, Jeremias Leao

Chemometrics and Intelligent Laboratory Systems 169 (2017) 64-78

https://doi.org/10.1016/j.chemolab.2017.08.006

Abstract

Choosing the right number of latent factors to be used in PLS regression (Partial Least Squares Regression) has been a matter of concern among users, academics and researchers. In this paper, we introduce a statistic to select the appropriate number of latent factors according to the model predictive ability. This method is based on the Predicted Residual Error Sum of Squares (PRESS) for PLS regression. Our mathematical development is based on matrix calculations obtained from the orthogonal vectors that compose the matrix of latent factors. Currently, the leave-one-out method is widely used for this, where one observation is left out and then the regression model is estimated. This technique is repeated as many times as the number of observations. The advantage of using the PRESS statistic for PLS regression (P-PLS), developed in this work, is to have the possibility of selecting the best predictive model straightforwardly. Additionally, the P-PLS can be used for analyzing the impact caused by the ith observation on the PLS regression vector of coefficients, as well as for detecting other kinds of data that affect the model.