Predicting Age from Neuroimaging using an Automated Machine Learning approach (autoML)
The use of machine learning (ML) algorithms has significantly increased in neuroscience. However, from the vast extent of possible ML algorithms, which one is the optimal model to predict the target variable? What are the hyperparameters for such a model? Given the plethora of possible answers to these questions, in the last years, automated ML (autoML) has been gaining attention. Here, we apply an autoML library called Tree‐based Pipeline Optimisation Tool (TPOT) which uses a tree‐based representation of ML pipelines and conducts a genetic programming‐based approach to find the model and its hyperparameters that more closely predicts the subject’s true age. To explore autoML and evaluate its efficacy within neuroimaging data sets, we chose a problem that has been the focus of previous extensive study: brain age prediction. Without any prior knowledge, TPOT was able to scan through the model space and create pipelines that outperformed the state‐of‐the‐art accuracy for Freesurfer‐based models using only thickness and volume information for anatomical structure. In particular, we compared the performance of TPOT (mean absolute error [MAE]: 4.612 ± .124 years) and a relevance vector regression (MAE 5.474 ± .140 years). TPOT also suggested interesting combinations of models that do not match the current most used models for brain prediction but generalise well to unseen data. AutoML showed promising results as a data‐driven approach to find optimal models for neuroimaging applications.