|
--- |
|
language: |
|
- fr |
|
tags: |
|
- token-classification |
|
- fill-mask |
|
license: mit |
|
datasets: |
|
- iit-cdip |
|
--- |
|
|
|
|
|
This model is the combined camembert-base model, with the pretrained lilt checkpoint from the paper "LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding". |
|
|
|
Original repository: https://github.com/jpWang/LiLT |
|
|
|
To use it, it is necessary to fork the modeling and configuration files from the original repository, and load the pretrained model from the corresponding classes (LiLTRobertaLikeConfig, LiLTRobertaLikeForRelationExtraction, LiLTRobertaLikeForTokenClassification, LiLTRobertaLikeModel). |
|
They can also be preloaded with the AutoConfig/model factories as such: |
|
|
|
```python |
|
from transformers import AutoModelForTokenClassification, AutoConfig |
|
|
|
from path_to_custom_classes import ( |
|
LiLTRobertaLikeConfig, |
|
LiLTRobertaLikeForRelationExtraction, |
|
LiLTRobertaLikeForTokenClassification, |
|
LiLTRobertaLikeModel |
|
) |
|
|
|
|
|
def patch_transformers(): |
|
AutoConfig.register("liltrobertalike", LiLTRobertaLikeConfig) |
|
AutoModel.register(LiLTRobertaLikeConfig, LiLTRobertaLikeModel) |
|
AutoModelForTokenClassification.register(LiLTRobertaLikeConfig, LiLTRobertaLikeForTokenClassification) |
|
# etc... |
|
``` |
|
|
|
To load the model, it is then possible to use: |
|
```python |
|
# patch_transformers() must have been executed beforehand |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name, use_auth_token=self.use_auth_token) |
|
model = AutoModel.from_pretrained("manu/lilt-camembert-base") |
|
model = AutoModelForTokenClassification.from_pretrained("manu/lilt-camembert-base") # to be fine-tuned on a token classification task |
|
``` |