arxiv:2409.13882

Tabular Data Generation using Binary Diffusion

Published on Sep 20

· Submitted by

vitaliykinakh on Sep 25

Upvote

Authors:

Vitaliy Kinakh ,

Abstract

Generating synthetic tabular data is critical in machine learning, especially when real data is limited or sensitive. Traditional generative models often face challenges due to the unique characteristics of tabular data, such as mixed data types and varied distributions, and require complex preprocessing or large pretrained models. In this paper, we introduce a novel, lossless binary transformation method that converts any tabular data into fixed-size binary representations, and a corresponding new generative model called Binary Diffusion, specifically designed for binary data. Binary Diffusion leverages the simplicity of XOR operations for noise addition and removal and employs binary cross-entropy loss for training. Our approach eliminates the need for extensive preprocessing, complex noise parameter tuning, and pretraining on large datasets. We evaluate our model on several popular tabular benchmark datasets, demonstrating that Binary Diffusion outperforms existing state-of-the-art models on Travel, Adult Income, and Diabetes datasets while being significantly smaller in size.

View arXiv page View PDF Add to collection

Community

vitaliykinakh

Paper author Paper submitter Sep 25

The paper introduces a novel method for generating synthetic tabular data using a novel Binary Diffusion model. It transforms tabular data into fixed-size binary representations and employs XOR operations and binary cross-entropy loss for training. This approach simplifies preprocessing, avoids large pretrained models, and achieves state-of-the-art results on benchmark datasets like Travel, Adult Income, and Diabetes while maintaining a smaller model size.

Code will be released soon

nielsr

Sep 30

Cool work, congrats!

Let us know if you need any help publishing artifacts (model, datasets) on the hub. Leaving some guides here:

models: https://huggingface.co/docs/hub/models-uploading
datasets: https://huggingface.co/docs/datasets/loading#csv.

librarian-bot

Sep 26

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2409.13882 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2409.13882 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.13882 in a Space README.md to link it from this page.