lijialudew
commited on
Commit
•
354afc5
1
Parent(s):
738d959
Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ pipeline_tag: audio-classification
|
|
16 |
We build a CTC-based phoneme recognition model using wav2vec 2.0 (W2V2) for children under 4-year-old. We use three-level fine-tuning to gradually reduce age mismatch between adult phonetics to child phonetics.
|
17 |
|
18 |
- **W2V2-Libri100h**: We first fine-tune W2V2-Base using 100 hours of LibriSpeech pretrained on unlabeled 960 hours LibriSpeech adult speech corpus with IPA phone sequences.
|
19 |
-
- **W2V2-MyST**: We then fine-tune W2V2-Libri100h using [My Science Tutor](https://
|
20 |
- **W2V2-Libri100h-Pro (two-level fine-tuning)**: We fine-tune W2V2-Libri100h using [Providence](https://phonbank.talkbank.org/access/Eng-NA/Providence.html) corpus (consists of longititude audio of 6 English-speaking children aged from 1-4 years interacting with their mothers at home) on phoneme sequences.
|
21 |
- **W2V2-MyST-Pro (three-level fine-tuning)**: Similar as W2V2-Libri100h-Pro, we fine-tune W2V2-MyST using Providence on phoneme sequences.
|
22 |
|
@@ -24,7 +24,8 @@ We show W2V2-MyST-Pro is helpful for improving children's vocalization classific
|
|
24 |
|
25 |
## Model Sources
|
26 |
For more information regarding this model, please checkout our paper:
|
27 |
-
- **
|
|
|
28 |
|
29 |
## Model Description
|
30 |
|
@@ -37,27 +38,35 @@ Folder contains the best checkpoint of the following setting
|
|
37 |
|
38 |
## Uses
|
39 |
**We develop our complete fine-tuning recipe using SpeechBrain toolkit available at**
|
40 |
-
|
|
|
41 |
- **https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/RABC** (used for Rapid-ABC corpus)
|
42 |
- **https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/Babblecor** (used for BabbleCor corpus)
|
43 |
-
|
44 |
# Paper/BibTex Citation
|
45 |
|
46 |
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
47 |
If you found this model helpful to you, please cite us as
|
48 |
<pre><code>
|
49 |
@article{li2023enhancing,
|
50 |
-
title={Enhancing Child Vocalization Classification
|
51 |
author={Li, Jialu and Hasegawa-Johnson, Mark and Karahalios, Karrie},
|
52 |
-
|
53 |
-
year={
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
}
|
55 |
</code></pre>
|
56 |
|
57 |
# Model Card Contact
|
58 |
-
Jialu Li (she, her, hers)
|
59 |
-
|
60 |
-
Ph.D candidate @ Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
|
61 |
|
62 |
E-mail: jialuli3@illinois.edu
|
63 |
|
|
|
16 |
We build a CTC-based phoneme recognition model using wav2vec 2.0 (W2V2) for children under 4-year-old. We use three-level fine-tuning to gradually reduce age mismatch between adult phonetics to child phonetics.
|
17 |
|
18 |
- **W2V2-Libri100h**: We first fine-tune W2V2-Base using 100 hours of LibriSpeech pretrained on unlabeled 960 hours LibriSpeech adult speech corpus with IPA phone sequences.
|
19 |
+
- **W2V2-MyST**: We then fine-tune W2V2-Libri100h using [My Science Tutor](https://catalog.ldc.upenn.edu/LDC2021S05) corpus (consists of conversational speech of students between the third and fifth grades with a virtual tutor).
|
20 |
- **W2V2-Libri100h-Pro (two-level fine-tuning)**: We fine-tune W2V2-Libri100h using [Providence](https://phonbank.talkbank.org/access/Eng-NA/Providence.html) corpus (consists of longititude audio of 6 English-speaking children aged from 1-4 years interacting with their mothers at home) on phoneme sequences.
|
21 |
- **W2V2-MyST-Pro (three-level fine-tuning)**: Similar as W2V2-Libri100h-Pro, we fine-tune W2V2-MyST using Providence on phoneme sequences.
|
22 |
|
|
|
24 |
|
25 |
## Model Sources
|
26 |
For more information regarding this model, please checkout our paper:
|
27 |
+
- **[Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis](https://arxiv.org/abs/2309.07287)**
|
28 |
+
- **[Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations](https://arxiv.org/abs/2402.06888)**
|
29 |
|
30 |
## Model Description
|
31 |
|
|
|
38 |
|
39 |
## Uses
|
40 |
**We develop our complete fine-tuning recipe using SpeechBrain toolkit available at**
|
41 |
+
TO DO
|
42 |
+
<!--
|
43 |
- **https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/RABC** (used for Rapid-ABC corpus)
|
44 |
- **https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/Babblecor** (used for BabbleCor corpus)
|
45 |
+
-->
|
46 |
# Paper/BibTex Citation
|
47 |
|
48 |
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
49 |
If you found this model helpful to you, please cite us as
|
50 |
<pre><code>
|
51 |
@article{li2023enhancing,
|
52 |
+
title={Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis},
|
53 |
author={Li, Jialu and Hasegawa-Johnson, Mark and Karahalios, Karrie},
|
54 |
+
booktitle={Interspeech},
|
55 |
+
year={2024}
|
56 |
+
}
|
57 |
+
</code></pre>
|
58 |
+
or
|
59 |
+
<pre><code>
|
60 |
+
@inproceedings{li2024analysis,
|
61 |
+
title={Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations},
|
62 |
+
author={Li, Jialu and Hasegawa-Johnson, Mark and McElwain, Nancy L},
|
63 |
+
booktitle={IEEE Workshop on Self-Supervision in Audio, Speech and Beyond (SASB)},
|
64 |
+
year={2024}
|
65 |
}
|
66 |
</code></pre>
|
67 |
|
68 |
# Model Card Contact
|
69 |
+
Jialu Li, Ph.D. (she, her, hers)
|
|
|
|
|
70 |
|
71 |
E-mail: jialuli3@illinois.edu
|
72 |
|