Abstract
Despite the rise to dominance of deep learning in unstructured data domains, tree-based methods such as Random Forests (RF) and Gradient Boosted Decision Trees (GBDT) are still the workhorses for handling discriminative tasks on tabular data. We explore generative extensions of these popular algorithms with a focus on explicitly modeling the data density (up to a normalization constant), thus enabling other applications besides sampling. As our main contribution we propose an energy-based generative boosting algorithm that is analogous to the second order boosting implemented in popular packages like XGBoost. We show that, despite producing a generative model capable of handling inference tasks over any input variable, our proposed algorithm can achieve similar discriminative performance to GBDT on a number of real world tabular datasets, outperforming alternative generative approaches. At the same time, we show that it is also competitive with neural network based models for sampling.
Community
A practical, energy-based generative boosting model for (unnormalized) density estimation and sampling in tabular data.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific Energy-Based Models (2024)
- Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space (2024)
- DynFrs: An Efficient Framework for Machine Unlearning in Random Forest (2024)
- Mambular: A Sequential Model for Tabular Deep Learning (2024)
- Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper