Google AI Introduces TabFM: A Hybrid-Attention Tabular Foundation Model for Zero-Shot Classification and Regression
Google Research introduced TabFM , a foundation model built for tabular data.

Google Research introduced TabFM , a foundation model built for tabular data. TabFM performs classification and regression without dataset-specific training. Every prediction comes from a single forward pass. The model reframes tabular prediction as an in-context learning problem. It is available now on Hugging Face and GitHub.
Tabular data forms the backbone of enterprise data infrastructure. Tasks like customer churn and financial fraud detection live in tables. For years, tree-based methods dominated this space. XGBoost, AdaBoost, and random forests offered robust results on structured data. Google frames TabFM as the tabular counterpart to TimesFM, its zero-shot time-series model.
That reliability carried a cost. Fitting XGBoost to a new dataset is rarely one .fit() call. Data scientists spend hours on hyperparameter optimization and feature engineering. They do this just to extract a reliable signal from raw data. TabFM targets exactly that bottleneck.
TabFM applies the zero-shot logic that large language models made familiar. LLMs learn new tasks from in-context examples, without updating any weights. This technique is called in-context learning (ICL). TabFM brings the same idea to tables. It generates predictions on previously unseen tables in one pass.
Traditional models update parameters for each dataset’s distribution. TabFM skips that step entirely. It takes the whole dataset as a single unified prompt. That prompt holds both training examples and target testing rows. The model reads column and row relationships at inference time.
Tables are not text. They are two-dimensional and inherently orderless. Swapping two rows or two columns does not change their meaning. Standard language models process one-dimensional, ordered sequences instead. To bridge that gap, TabFM synthesizes TabPFN and TabICL into a hybrid design.
Foundation models need vast, diverse data. High-quality tabular datasets are scarce in the open-source space. Industrial tables carry proprietary schemas and sensitive information. That makes them inaccessible for broad pre-training.
Synthetic tables can be generated to be arbitrarily large. Google’s research team calls them effectively the only viable option at this scale. So TabFM trains entirely on hundreds of millions of synthetic datasets. These are generated dynamically using structural causal models (SCMs). Each incorporates a wide variety of random functions. The approach captures distributions and complex feature relationships found in real tables. The research team reports the model generalizes well to unseen real-world data.
Source: MarkTechPost