spitfire4794 commited on
Commit
f6398dd
1 Parent(s): 7a552b6

Upload folder using huggingface_hub

Browse files
Files changed (8) hide show
  1. .gitattributes +0 -1
  2. README.md +105 -0
  3. coarse.pt +3 -0
  4. coarse_2.pt +3 -0
  5. fine.pt +3 -0
  6. fine_2.pt +3 -0
  7. text.pt +3 -0
  8. text_2.pt +3 -0
.gitattributes CHANGED
@@ -25,7 +25,6 @@
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
  *.tflite filter=lfs diff=lfs merge=lfs -text
30
  *.tgz filter=lfs diff=lfs merge=lfs -text
31
  *.wasm filter=lfs diff=lfs merge=lfs -text
 
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
 
28
  *.tflite filter=lfs diff=lfs merge=lfs -text
29
  *.tgz filter=lfs diff=lfs merge=lfs -text
30
  *.wasm filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - de
5
+ - es
6
+ - fr
7
+ - hi
8
+ - it
9
+ - ja
10
+ - ko
11
+ - pl
12
+ - pt
13
+ - ru
14
+ - tr
15
+ - zh
16
+ thumbnail: https://user-images.githubusercontent.com/5068315/230698495-cbb1ced9-c911-4c9a-941d-a1a4a1286ac6.png
17
+ library: "bark"
18
+ license: "cc-by-nc-4.0"
19
+ tags:
20
+ - bark
21
+ - audio
22
+ - text-to-speech
23
+ ---
24
+
25
+ # Bark
26
+
27
+ Bark is a transformer-based text-to-audio model created by [Suno](https://www.suno.ai).
28
+ Bark can generate highly realistic, multilingual speech as well as other audio - including music,
29
+ background noise and simple sound effects. The model can also produce nonverbal
30
+ communications like laughing, sighing and crying. To support the research community,
31
+ we are providing access to pretrained model checkpoints ready for inference.
32
+
33
+ The original github repo and model card can be found [here](https://github.com/suno-ai/bark).
34
+
35
+ This model is meant for research purposes only.
36
+ The model output is not censored and the authors do not endorse the opinions in the generated content.
37
+ Use at your own risk.
38
+
39
+ The following is additional information about the models released here.
40
+
41
+ ## Model Usage
42
+
43
+ ```python
44
+ from bark import SAMPLE_RATE, generate_audio, preload_models
45
+ from IPython.display import Audio
46
+
47
+ # download and load all models
48
+ preload_models()
49
+
50
+ # generate audio from text
51
+ text_prompt = """
52
+ Hello, my name is Suno. And, uh — and I like pizza. [laughs]
53
+ But I also have other interests such as playing tic tac toe.
54
+ """
55
+ audio_array = generate_audio(text_prompt)
56
+
57
+ # play text in notebook
58
+ Audio(audio_array, rate=SAMPLE_RATE)
59
+ ```
60
+
61
+ [pizza.webm](https://user-images.githubusercontent.com/5068315/230490503-417e688d-5115-4eee-9550-b46a2b465ee3.webm)
62
+
63
+
64
+ To save `audio_array` as a WAV file:
65
+
66
+ ```python
67
+ from scipy.io.wavfile import write as write_wav
68
+
69
+ write_wav("/path/to/audio.wav", SAMPLE_RATE, audio_array)
70
+ ```
71
+
72
+ ## Model Details
73
+
74
+ Bark is a series of three transformer models that turn text into audio.
75
+
76
+ ### Text to semantic tokens
77
+ - Input: text, tokenized with [BERT tokenizer from Hugging Face](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer)
78
+ - Output: semantic tokens that encode the audio to be generated
79
+
80
+ ### Semantic to coarse tokens
81
+ - Input: semantic tokens
82
+ - Output: tokens from the first two codebooks of the [EnCodec Codec](https://github.com/facebookresearch/encodec) from facebook
83
+
84
+ ### Coarse to fine tokens
85
+ - Input: the first two codebooks from EnCodec
86
+ - Output: 8 codebooks from EnCodec
87
+
88
+ ### Architecture
89
+ | Model | Parameters | Attention | Output Vocab size |
90
+ |:-------------------------:|:----------:|------------|:-----------------:|
91
+ | Text to semantic tokens | 80/300 M | Causal | 10,000 |
92
+ | Semantic to coarse tokens | 80/300 M | Causal | 2x 1,024 |
93
+ | Coarse to fine tokens | 80/300 M | Non-causal | 6x 1,024 |
94
+
95
+
96
+ ### Release date
97
+ April 2023
98
+
99
+ ## Broader Implications
100
+ We anticipate that this model's text to audio capabilities can be used to improve accessbility tools in a variety of languages.
101
+
102
+ While we hope that this release will enable users to express their creativity and build applications that are a force
103
+ for good, we acknowledge that any text to audio model has the potential for dual use. While it is not straightforward
104
+ to voice clone known people with Bark, it can still be used for nefarious purposes. To further reduce the chances of unintended use of Bark,
105
+ we also release a simple classifier to detect Bark-generated audio with high accuracy (see notebooks section of the main repository).
coarse.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:110580140ce5319b5b26849e24378d7594eb75ad11e7203e3091a876a07e4536
3
+ size 1251939909
coarse_2.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:286abc253d4d7f4d148325df07585f7ca4fca36ce40577a1ddd744a8b35e4388
3
+ size 3934534533
fine.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ec1eb35cd3e21506b0c045ded225271d9a25d9fa608662585cfd749590a0eac
3
+ size 1107111557
fine_2.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:799c87afab4b01537094c63ea231f2c42c9c07aeb16773690540ad251a6d8fab
3
+ size 3741740229
text.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecd798cf39a5ecbec30ef41a3d9d63fb61ea09b78d3bd5dfebb2f7343087b1be
3
+ size 2315982725
text_2.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ccdedd35373bc3a16845f1f1452c5c96926f5cbccab01e824f7f15add2c16a35
3
+ size 5353258741