Spaces:
Running
Running
Add documentation for additional options
Browse files- docs/options.md +50 -2
docs/options.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
# Options
|
2 |
To transcribe or translate an audio file, you can either copy an URL from a website (all [websites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)
|
3 |
supported by YT-DLP will work, including YouTube). Otherwise, upload an audio file (choose "All Files (*.*)"
|
4 |
in the file selector to select any file type, including video files) or use the microphone.
|
@@ -83,4 +83,52 @@ Note that detected lines in gaps between speech sections will not be included in
|
|
83 |
# Command Line Options
|
84 |
|
85 |
Both `app.py` and `cli.py` also accept command line options, such as the ability to enable parallel execution on multiple
|
86 |
-
CPU/GPU cores, the default model name/VAD and so on. Consult the README in the root folder for more information.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Standard Options
|
2 |
To transcribe or translate an audio file, you can either copy an URL from a website (all [websites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)
|
3 |
supported by YT-DLP will work, including YouTube). Otherwise, upload an audio file (choose "All Files (*.*)"
|
4 |
in the file selector to select any file type, including video files) or use the microphone.
|
|
|
83 |
# Command Line Options
|
84 |
|
85 |
Both `app.py` and `cli.py` also accept command line options, such as the ability to enable parallel execution on multiple
|
86 |
+
CPU/GPU cores, the default model name/VAD and so on. Consult the README in the root folder for more information.
|
87 |
+
|
88 |
+
# Additional Options
|
89 |
+
|
90 |
+
In addition to the above, there's also a "Full" options interface that allows you to set all the options available in the Whisper
|
91 |
+
model. The options are as follows:
|
92 |
+
|
93 |
+
## Initial Prompt
|
94 |
+
Optional text to provide as a prompt for the first 30 seconds window. Whisper will attempt to use this as a starting point for the transcription, but you can
|
95 |
+
also get creative and specify a style or format for the output of the transcription.
|
96 |
+
|
97 |
+
For instance, if you use the prompt "hello how is it going always use lowercase no punctuation goodbye one two three start stop i you me they", Whisper will
|
98 |
+
be biased to output lower capital letters and no punctuation, and may also be biased to output the words in the prompt more often.
|
99 |
+
|
100 |
+
## Temperature
|
101 |
+
The temperature to use when sampling. Default is 0 (zero). A higher temperature will result in more random output, while a lower temperature will be more deterministic.
|
102 |
+
|
103 |
+
## Best Of - Non-zero temperature
|
104 |
+
The number of candidates to sample from when sampling with non-zero temperature. Default is 5.
|
105 |
+
|
106 |
+
## Beam Size - Zero temperature
|
107 |
+
The number of beams to use in beam search when sampling with zero temperature. Default is 5.
|
108 |
+
|
109 |
+
## Patience - Zero temperature
|
110 |
+
The patience value to use in beam search when sampling with zero temperature. As in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search.
|
111 |
+
|
112 |
+
## Length Penalty - Any temperature
|
113 |
+
The token length penalty coefficient (alpha) to use when sampling with any temperature. As in https://arxiv.org/abs/1609.08144, uses simple length normalization by default.
|
114 |
+
|
115 |
+
## Suppress Tokens - Comma-separated list of token IDs
|
116 |
+
A comma-separated list of token IDs to suppress during sampling. The default value of "-1" will suppress most special characters except common punctuations.
|
117 |
+
|
118 |
+
## Condition on previous text
|
119 |
+
If True, provide the previous output of the model as a prompt for the next window. Disabling this may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop.
|
120 |
+
|
121 |
+
## FP16
|
122 |
+
Whether to perform inference in fp16. True by default.
|
123 |
+
|
124 |
+
## Temperature increment on fallback
|
125 |
+
The temperature to increase when falling back when the decoding fails to meet either of the thresholds below. Default is 0.2.
|
126 |
+
|
127 |
+
## Compression ratio threshold
|
128 |
+
If the gzip compression ratio is higher than this value, treat the decoding as failed. Default is 2.4.
|
129 |
+
|
130 |
+
## Logprob threshold
|
131 |
+
If the average log probability is lower than this value, treat the decoding as failed. Default is -1.0.
|
132 |
+
|
133 |
+
## No speech threshold
|
134 |
+
If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence. Default is 0.6.
|