yschneider starride-teklia commited on
Commit
aa5dfb9
1 Parent(s): 72feb57

Add the POPP PyLaia model (#1)

Browse files

- Add the POPP PyLaia model (5690fb1e3215283bafb8b31b62a6f2f899134e17)


Co-authored-by: Solène Tarride <starride-teklia@users.noreply.huggingface.co>

Files changed (7) hide show
  1. README.md +35 -0
  2. language_model.arpa.gz +3 -0
  3. lexicon.txt +87 -0
  4. model +0 -0
  5. syms.txt +87 -0
  6. tokens.txt +87 -0
  7. weights.ckpt +3 -0
README.md CHANGED
@@ -1,3 +1,38 @@
1
  ---
 
2
  license: mit
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: PyLaia
3
  license: mit
4
+ tags:
5
+ - PyLaia
6
+ - PyTorch
7
+ - Handwritten text recognition
8
+ metrics:
9
+ - CER
10
+ - WER
11
+ language:
12
+ - 'fr'
13
  ---
14
+
15
+ # POPP handwritten text recognition
16
+
17
+ This model performs Handwritten Text Recognition on French census documents.
18
+
19
+ ## Model description
20
+
21
+ The model was trained using the PyLaia library on the [POPP generic](https://github.com/Shulk97/POPP-datasets/).
22
+
23
+ For training, text-lines were resized with a fixed height of 128 pixels, keeping the original aspect ratio.
24
+
25
+ An external 6-gram character language model can be used to improve recognition. The language model is trained on the text from the POPP training set.
26
+
27
+ ## Evaluation results
28
+
29
+ The model achieves the following results:
30
+
31
+ | set | Language model | CER (%) | WER (%) | N lines |
32
+ |:------|:---------------|:----------:|:-------:|----------:|
33
+ | test | no | 16.49 | 36.26 | 479 |
34
+ | test | yes | 16.09 | 34.52 | 479 |
35
+
36
+ ## How to use
37
+
38
+ Please refer to the [documentation](https://atr.pages.teklia.com/pylaia/).
language_model.arpa.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7d2dc6ec84ff4302315e61554fe7dfc3a44dc304bb1d83c560d6da3f4582bf6
3
+ size 4391627
lexicon.txt ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <ctc> <ctc>
2
+ ! !
3
+ " "
4
+ & &
5
+ ' '
6
+ ( (
7
+ ) )
8
+ - -
9
+ 0 0
10
+ 1 1
11
+ 2 2
12
+ 3 3
13
+ 4 4
14
+ 5 5
15
+ 6 6
16
+ 7 7
17
+ 8 8
18
+ 9 9
19
+ : :
20
+ ? ?
21
+ A A
22
+ B B
23
+ C C
24
+ D D
25
+ E E
26
+ F F
27
+ G G
28
+ H H
29
+ I I
30
+ J J
31
+ K K
32
+ L L
33
+ M M
34
+ N N
35
+ O O
36
+ P P
37
+ Q Q
38
+ R R
39
+ S S
40
+ T T
41
+ U U
42
+ V V
43
+ W W
44
+ X X
45
+ Y Y
46
+ Z Z
47
+ a a
48
+ b b
49
+ c c
50
+ d d
51
+ e e
52
+ f f
53
+ g g
54
+ h h
55
+ i i
56
+ j j
57
+ k k
58
+ l l
59
+ m m
60
+ n n
61
+ o o
62
+ p p
63
+ q q
64
+ r r
65
+ s s
66
+ t t
67
+ u u
68
+ v v
69
+ w w
70
+ x x
71
+ y y
72
+ z z
73
+ ° °
74
+ à à
75
+ â â
76
+ ç ç
77
+ è è
78
+ é é
79
+ ê ê
80
+ ë ë
81
+ î î
82
+ ï ï
83
+ ô ô
84
+ ù ù
85
+ ü ü
86
+ <unk> <unk>
87
+ <space> <space>
model ADDED
Binary file (1.52 kB). View file
 
syms.txt ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <ctc> 0
2
+ ! 1
3
+ " 2
4
+ & 3
5
+ ' 4
6
+ ( 5
7
+ ) 6
8
+ - 7
9
+ 0 8
10
+ 1 9
11
+ 2 10
12
+ 3 11
13
+ 4 12
14
+ 5 13
15
+ 6 14
16
+ 7 15
17
+ 8 16
18
+ 9 17
19
+ : 18
20
+ ? 19
21
+ A 20
22
+ B 21
23
+ C 22
24
+ D 23
25
+ E 24
26
+ F 25
27
+ G 26
28
+ H 27
29
+ I 28
30
+ J 29
31
+ K 30
32
+ L 31
33
+ M 32
34
+ N 33
35
+ O 34
36
+ P 35
37
+ Q 36
38
+ R 37
39
+ S 38
40
+ T 39
41
+ U 40
42
+ V 41
43
+ W 42
44
+ X 43
45
+ Y 44
46
+ Z 45
47
+ a 46
48
+ b 47
49
+ c 48
50
+ d 49
51
+ e 50
52
+ f 51
53
+ g 52
54
+ h 53
55
+ i 54
56
+ j 55
57
+ k 56
58
+ l 57
59
+ m 58
60
+ n 59
61
+ o 60
62
+ p 61
63
+ q 62
64
+ r 63
65
+ s 64
66
+ t 65
67
+ u 66
68
+ v 67
69
+ w 68
70
+ x 69
71
+ y 70
72
+ z 71
73
+ ° 72
74
+ à 73
75
+ â 74
76
+ ç 75
77
+ è 76
78
+ é 77
79
+ ê 78
80
+ ë 79
81
+ î 80
82
+ ï 81
83
+ ô 82
84
+ ù 83
85
+ ü 84
86
+ <unk> 85
87
+ <space> 86
tokens.txt ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <ctc>
2
+ !
3
+ "
4
+ &
5
+ '
6
+ (
7
+ )
8
+ -
9
+ 0
10
+ 1
11
+ 2
12
+ 3
13
+ 4
14
+ 5
15
+ 6
16
+ 7
17
+ 8
18
+ 9
19
+ :
20
+ ?
21
+ A
22
+ B
23
+ C
24
+ D
25
+ E
26
+ F
27
+ G
28
+ H
29
+ I
30
+ J
31
+ K
32
+ L
33
+ M
34
+ N
35
+ O
36
+ P
37
+ Q
38
+ R
39
+ S
40
+ T
41
+ U
42
+ V
43
+ W
44
+ X
45
+ Y
46
+ Z
47
+ a
48
+ b
49
+ c
50
+ d
51
+ e
52
+ f
53
+ g
54
+ h
55
+ i
56
+ j
57
+ k
58
+ l
59
+ m
60
+ n
61
+ o
62
+ p
63
+ q
64
+ r
65
+ s
66
+ t
67
+ u
68
+ v
69
+ w
70
+ x
71
+ y
72
+ z
73
+ °
74
+ à
75
+ â
76
+ ç
77
+ è
78
+ é
79
+ ê
80
+ ë
81
+ î
82
+ ï
83
+ ô
84
+ ù
85
+ ü
86
+ <unk>
87
+ <space>
weights.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:535712a733c8d5e7406d90e39a8c82c065a127dffa6762690d4960cf9d206d7a
3
+ size 42696796