manu commited on
Commit
2c61db3
1 Parent(s): b8cf323

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - fr
4
+ tags:
5
+ - token-classification
6
+ - fill-mask
7
+ license: mit
8
+ datasets:
9
+ - iit-cdip
10
+ ---
11
+
12
+
13
+ This model is the combined camembert-base model, with the pretrained lilt checkpoint from the paper "LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding".
14
+
15
+ Original repository: https://github.com/jpWang/LiLT
16
+
17
+ To use it, it is necessary to fork the modeling and configuration files from the original repository, and load the pretrained model from the corresponding classes (LiLTRobertaLikeConfig, LiLTRobertaLikeForRelationExtraction, LiLTRobertaLikeForTokenClassification, LiLTRobertaLikeModel).
18
+ They can also be preloaded with the AutoConfig/model factories as such:
19
+
20
+ ```python
21
+ from transformers import AutoModelForTokenClassification, AutoConfig
22
+
23
+ from path_to_custom_classes import (
24
+ LiLTRobertaLikeConfig,
25
+ LiLTRobertaLikeForRelationExtraction,
26
+ LiLTRobertaLikeForTokenClassification,
27
+ LiLTRobertaLikeModel
28
+ )
29
+
30
+
31
+ def patch_transformers():
32
+ AutoConfig.register("liltrobertalike", LiLTRobertaLikeConfig)
33
+ AutoModel.register(LiLTRobertaLikeConfig, LiLTRobertaLikeModel)
34
+ AutoModelForTokenClassification.register(LiLTRobertaLikeConfig, LiLTRobertaLikeForTokenClassification)
35
+ # etc...
36
+ ```
37
+
38
+ To load the model, it is then possible to use:
39
+ ```python
40
+ # patch_transformers() must have been executed beforehand
41
+
42
+ tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name, use_auth_token=self.use_auth_token)
43
+ model = AutoModel.from_pretrained("manu/lilt-camembert-base")
44
+ model = AutoModelForTokenClassification.from_pretrained("manu/lilt-camembert-base") # to be fine-tuned on a token classification task
45
+ ```