DualGPT / README.md
cyk1337's picture
Update README.md
aea82d9 verified
metadata
{}

EMNLP 2024

This repository contains the official checkpoint for PixelGPT, as presented in the paper Autoregressive Pre-Training on Pixels and Texts (EMNLP 2024). For detailed instructions on how to use the model, please visit our GitHub page.

Model Description

DualGPT is an autoregressive language model pre-trained on the dual modality of both pixels and texts. By processing documents as visual data (pixels), the model learns to predict both the next token and the next image patch in a sequence, enabling it to handle visually complex tasks in different modalities.

Citation

@misc{chai2024autoregressivepretrainingpixelstexts,
  title = {Autoregressive Pre-Training on Pixels and Texts},
  author = {Chai, Yekun and Liu, Qingyi and Xiao, Jingwu and Wang, Shuohuan and Sun, Yu and Wu, Hua},
  year = {2024},
  eprint = {2404.10710},
  archiveprefix = {arXiv},
  primaryclass = {cs.CL},
  url = {https://arxiv.org/abs/2404.10710},
}