zolekode
commited on
Commit
•
51d30ac
1
Parent(s):
b5ba718
updated read me
Browse files- .gitattributes +6 -0
- README.md +3 -3
.gitattributes
CHANGED
@@ -14,3 +14,9 @@
|
|
14 |
*.pb filter=lfs diff=lfs merge=lfs -text
|
15 |
*.pt filter=lfs diff=lfs merge=lfs -text
|
16 |
*.pth filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
*.pb filter=lfs diff=lfs merge=lfs -text
|
15 |
*.pt filter=lfs diff=lfs merge=lfs -text
|
16 |
*.pth filter=lfs diff=lfs merge=lfs -text
|
17 |
+
t5-small-wav2vec2-grammar-fixer/spiece.model filter=lfs diff=lfs merge=lfs -text
|
18 |
+
t5-small-wav2vec2-grammar-fixer/tf_model.h5 filter=lfs diff=lfs merge=lfs -text
|
19 |
+
t5-small-wav2vec2-grammar-fixer/tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
|
20 |
+
t5-small-wav2vec2-grammar-fixer/config.json filter=lfs diff=lfs merge=lfs -text
|
21 |
+
t5-small-wav2vec2-grammar-fixer/pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
|
22 |
+
t5-small-wav2vec2-grammar-fixer/special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
# flexudy-pipe-question-generation-v2
|
2 |
After transcribing your audio with Wav2Vec2, you might be interested in a post processor.
|
3 |
|
4 |
-
|
5 |
|
6 |
```python
|
7 |
from transformers import T5Tokenizer, T5ForConditionalGeneration
|
@@ -38,7 +38,7 @@ BEFORE HE HAD TIME TO ANSWER A MUCH ENCUMBERED VERA BURST INTO THE ROOM WITH THE
|
|
38 |
```
|
39 |
OUTPUT 1:
|
40 |
```
|
41 |
-
Before he had time to answer a much
|
42 |
```
|
43 |
|
44 |
INPUT 2:
|
@@ -48,7 +48,7 @@ GOING ALONG SLUSHY COUNTRY ROADS AND SPEAKING TO DAMP AUDIENCES IN DRAUGHTY SCHO
|
|
48 |
|
49 |
OUTPUT 2:
|
50 |
```
|
51 |
-
Going along Slushy Country Roads and speaking to damp audiences in
|
52 |
```
|
53 |
I strongly recommend improving the performance via further fine-tuning or by training more examples.
|
54 |
- Possible Quick Rule based improvements: Align the transcribed version and the generated version. If the similarity of two words (case-insensitive) vary by more than some threshold based on some similarity metric (e.g. Levenshtein), then keep the transcribed word.
|
|
|
1 |
# flexudy-pipe-question-generation-v2
|
2 |
After transcribing your audio with Wav2Vec2, you might be interested in a post processor.
|
3 |
|
4 |
+
All paragraphs had at most 128 tokens (separated by white spaces)
|
5 |
|
6 |
```python
|
7 |
from transformers import T5Tokenizer, T5ForConditionalGeneration
|
|
|
38 |
```
|
39 |
OUTPUT 1:
|
40 |
```
|
41 |
+
Before he had time to answer a much encumbered vara burst into the room with the question, I say, can I leave these here. In 2002, these were a small black pig and a lusty specimen of black red game cock.
|
42 |
```
|
43 |
|
44 |
INPUT 2:
|
|
|
48 |
|
49 |
OUTPUT 2:
|
50 |
```
|
51 |
+
Going along Slushy Country Roads and speaking to damp audiences in Draughty School Rooms Day After day for a weekend, he'll have to put in an appearance at some place of worship on Sunday morning and he can come to us immediately afterwards.
|
52 |
```
|
53 |
I strongly recommend improving the performance via further fine-tuning or by training more examples.
|
54 |
- Possible Quick Rule based improvements: Align the transcribed version and the generated version. If the similarity of two words (case-insensitive) vary by more than some threshold based on some similarity metric (e.g. Levenshtein), then keep the transcribed word.
|