faster-whisper-large-v3
This is the model Whisper large-v3 converted to be used in faster-whisper.
Using
You can choose between monkey-patching faster-whisper 0.9.0 (while they don't update it) or using my fork (which is easier).
Using my fork
First, install it by executing:
pip install -U 'transformers[torch]>=4.35.0' https://github.com/PythonicCafe/faster-whisper/archive/refs/heads/feature/large-v3.zip#egg=faster-whisper
Then, use it as the regular faster-whisper:
import time
import faster_whisper
filename = "my-audio.mp3"
initial_prompt = "My podcast recording" # Or `None`
word_timestamps = False
vad_filter = True
temperature = 0.0
language = "pt"
model_size = "large-v3"
device, compute_type = "cuda", "float16"
# or: device, compute_type = "cpu", "float32"
model = faster_whisper.WhisperModel(model_size, device=device, compute_type=compute_type)
segments, transcription_info = model.transcribe(
filename,
word_timestamps=word_timestamps,
vad_filter=vad_filter,
temperature=temperature,
language=language,
initial_prompt=initial_prompt,
)
print(transcription_info)
start_time = time.time()
for segment in segments:
row = {
"start": segment.start,
"end": segment.end,
"text": segment.text,
}
if word_timestamps:
row["words"] = [
{"start": word.start, "end": word.end, "word": word.word}
for word in segment.words
]
print(row)
end_time = time.time()
print(f"Transcription finished in {end_time - start_time:.2f}s")
Monkey-patching faster-whisper 0.9.0
Make sure you have the latest version:
pip install -U 'faster-whisper>=0.9.0'
Then, use it with some little changes:
import time
import faster_whisper.transcribe
# Monkey patch 1 (add model to list)
faster_whisper.utils._MODELS["large-v3"] = "turicas/faster-whisper-large-v3"
# Monkey patch 2 (fix Tokenizer)
faster_whisper.transcribe.Tokenizer.encode = lambda self, text: self.tokenizer.encode(text, add_special_tokens=False)
filename = "my-audio.mp3"
initial_prompt = "My podcast recording" # Or `None`
word_timestamps = False
vad_filter = True
temperature = 0.0
language = "pt"
model_size = "large-v3"
device, compute_type = "cuda", "float16"
# or: device, compute_type = "cpu", "float32"
model = faster_whisper.transcribe.WhisperModel(model_size, device=device, compute_type=compute_type)
# Monkey patch 3 (change n_mels)
from faster_whisper.feature_extractor import FeatureExtractor
model.feature_extractor = FeatureExtractor(feature_size=128)
# Monkey patch 4 (change tokenizer)
from transformers import AutoProcessor
model.hf_tokenizer = AutoProcessor.from_pretrained("openai/whisper-large-v3").tokenizer
model.hf_tokenizer.token_to_id = lambda token: model.hf_tokenizer.convert_tokens_to_ids(token)
segments, transcription_info = model.transcribe(
filename,
word_timestamps=word_timestamps,
vad_filter=vad_filter,
temperature=temperature,
language=language,
initial_prompt=initial_prompt,
)
print(transcription_info)
start_time = time.time()
for segment in segments:
row = {
"start": segment.start,
"end": segment.end,
"text": segment.text,
}
if word_timestamps:
row["words"] = [
{"start": word.start, "end": word.end, "word": word.word}
for word in segment.words
]
print(row)
end_time = time.time()
print(f"Transcription finished in {end_time - start_time:.2f}s")
Converting
If you'd like to convert the model yourself, execute:
pip install -U 'ctranslate2>=3.21.0' 'transformers-4.35.0' 'OpenNMT-py==2.*' sentencepiece
ct2-transformers-converter --model openai/whisper-large-v3 --output_dir whisper-large-v3-ct2
Then, the files will be at whisper-large-v3-ct2/
.
License
These files have the same license as the original openai/whisper-large-v3 model: Apache 2.0.
- Downloads last month
- 12