AI · 8 min read · April 17, 2026
Automating Feature Preprocessing Beats Manual Tuning for Tabular ML
Study of 15 search algorithms on 45 datasets reveals evolution and random search outperform complex surrogate models for automated feature pipeline construction.
Automating feature preprocessing selection and ordering outperforms manual construction; evolution-based and random search algorithms lead.
- — Feature preprocessing order and selection critically affect classical ML model performance on tabular data.
- — Manual pipeline construction requires data scientists to make many sequential decisions with unclear payoff.
- — Auto-FP problem maps to hyperparameter optimization or neural architecture search frameworks.
- — Evolution-based algorithms achieve best average ranking across 45 public datasets.
- — Random search performs surprisingly well, beating many sophisticated surrogate-model approaches.
- — Bandit-based and surrogate-model algorithms underperform for Auto-FP despite success in HPO and NAS.
- — Bottleneck analysis identifies gaps between current algorithms and optimal preprocessing discovery.
- — AutoML tools show limitations when integrated with automated preprocessing pipelines.
Frequently asked
- The preprocessing search space appears to be irregular and high-dimensional, making it difficult for surrogate models and bandit algorithms to build accurate predictive models of performance. Random search avoids the overhead of model building and explores the space more uniformly. Evolution-based methods succeed because they adapt through mutation and selection without relying on learned surrogates.