-
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Paper β’ 1912.13318 β’ Published β’ 2 -
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Paper β’ 2012.14740 β’ Published β’ 1 -
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Paper β’ 2204.08387 β’ Published β’ 2
Collections
Discover the best community collections!
Collections including paper arxiv:2204.08387
-
CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents
Paper β’ 2004.12629 β’ Published β’ 2 -
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Paper β’ 2204.08387 β’ Published β’ 2 -
Text Role Classification in Scientific Charts Using Multimodal Transformers
Paper β’ 2402.14579 β’ Published β’ 1 -
An inclusive review on deep learning techniques and their scope in handwriting recognition
Paper β’ 2404.08011 β’ Published β’ 1
-
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Paper β’ 2305.02549 β’ Published β’ 6 -
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Paper β’ 2203.08411 β’ Published β’ 1 -
More efficient manual review of automatically transcribed tabular data
Paper β’ 2306.16126 β’ Published β’ 1 -
CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents
Paper β’ 2004.12629 β’ Published β’ 2
-
Noise-Aware Training of Layout-Aware Language Models
Paper β’ 2404.00488 β’ Published β’ 7 -
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Paper β’ 2203.08411 β’ Published β’ 1 -
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Paper β’ 2305.02549 β’ Published β’ 6 -
ETC: Encoding Long and Structured Inputs in Transformers
Paper β’ 2004.08483 β’ Published β’ 1
-
Noise-Aware Training of Layout-Aware Language Models
Paper β’ 2404.00488 β’ Published β’ 7 -
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Paper β’ 2204.08387 β’ Published β’ 2 -
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Paper β’ 2012.14740 β’ Published β’ 1 -
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Paper β’ 1912.13318 β’ Published β’ 2
-
Can large language models explore in-context?
Paper β’ 2403.15371 β’ Published β’ 32 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper β’ 2403.19655 β’ Published β’ 18 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper β’ 2404.00656 β’ Published β’ 10 -
Enabling Memory Safety of C Programs using LLMs
Paper β’ 2404.01096 β’ Published β’ 1