Update README.md

8f8dd24 verified 28 days ago

5.31 kB

	---
	license: apache-2.0
	language:
	- de
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	model-index:
	- name: whisper-large-v3-turbo-german by Florian Zimmermeister @primeLine
	results:
	- task:
	type: automatic-speech-recognition
	name: Speech Recognition
	dataset:
	name: German ASR Data-Mix
	type: flozi00/asr-german-mixed
	metrics:
	- type: wer
	value: 4.77 %
	name: Test WER
	datasets:
	- flozi00/asr-german-mixed
	- flozi00/asr-german-mixed-evals
	base_model:
	- primeline/whisper-large-v3-german
	---
	## Quant

	This is only a int8 quantization from [primeline/whisper-large-v3-turbo-german](https://huggingface.co/primeline/whisper-large-v3-turbo-german) per ctranslate2-converter, for usage e.g. in ctranslate2, faster-whisper, etc.

	## Modelcard from primeline/whisper-large-v3-german


	### Summary
	This model map provides information about a model based on Whisper Large v3 that has been fine-tuned for speech recognition in German. Whisper is a powerful speech recognition platform developed by OpenAI. This model has been specially optimized for processing and recognizing German speech.



	### Applications
	This model can be used in various application areas, including

	- Transcription of spoken German language
	- Voice commands and voice control
	- Automatic subtitling for German videos
	- Voice-based search queries in German
	- Dictation functions in word processing programs


	## Model family

	\| Model \| Parameters \| link \|
	\|----------------------------------\|------------\|--------------------------------------------------------------\|
	\| Whisper large v3 german \| 1.54B \| [link](https://huggingface.co/primeline/whisper-large-v3-german) \|
	\| Whisper large v3 turbo german \| 809M \| [link](https://huggingface.co/primeline/whisper-large-v3-turbo-german)
	\| Distil-whisper large v3 german \| 756M \| [link](https://huggingface.co/primeline/distil-whisper-large-v3-german) \|
	\| tiny whisper \| 37.8M \| [link](https://huggingface.co/primeline/whisper-tiny-german) \|


	## Evaluations

	\| Dataset \| openai-whisper-large-v3-turbo \| openai-whisper-large-v3 \| primeline-whisper-large-v3-german \| nyrahealth-CrisperWhisper \| primeline-whisper-large-v3-turbo-german \|
	\|---------------------------------\|-------------------------------\|-------------------------\|----------------------------------\|---------------------------\|----------------------------------------\|
	\| common_voice_19_0 \| 6.31 \| 5.84 \| 4.30 \| 4.14 \| 4.28 \|
	\| Tuda-De \| 11.45 \| 11.21 \| 9.89 \| 13.88 \| 8.10 \|
	\| multilingual librispeech \| 18.03 \| 17.69 \| 13.46 \| 10.10 \| 4.71 \|
	\| All \| 14.16 \| 13.79 \| 10.51 \| 8.48 \| 4.75 \|


	### Training data
	The training data for this model includes a large amount of spoken German from various sources. The data was carefully selected and processed to optimize recognition performance.


	### Training process
	The training of the model was performed with the following hyperparameters

	- Batch size: 12288
	- Epochs: 3
	- Learning rate: 1e-6
	- Data augmentation: No
	- Optimizer: [Ademamix](https://arxiv.org/abs/2409.03137)


	### How to use

	```python
	import torch
	from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
	from datasets import load_dataset
	device = "cuda:0" if torch.cuda.is_available() else "cpu"
	torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
	model_id = "primeline/whisper-large-v3-turbo-german"
	model = AutoModelForSpeechSeq2Seq.from_pretrained(
	model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
	)
	model.to(device)
	processor = AutoProcessor.from_pretrained(model_id)
	pipe = pipeline(
	"automatic-speech-recognition",
	model=model,
	tokenizer=processor.tokenizer,
	feature_extractor=processor.feature_extractor,
	max_new_tokens=128,
	chunk_length_s=30,
	batch_size=16,
	return_timestamps=True,
	torch_dtype=torch_dtype,
	device=device,
	)
	dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
	sample = dataset[0]["audio"]
	result = pipe(sample)
	print(result["text"])
	```


	## [About us](https://primeline-ai.com/en/)

	[![primeline AI](https://primeline-ai.com/wp-content/uploads/2024/02/pl_ai_bildwortmarke_original.svg)](https://primeline-ai.com/en/)


	Your partner for AI infrastructure in Germany <br>
	Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High-Performance Computing. Optimized for AI training and inference.



	Model author: [Florian Zimmermeister](https://huggingface.co/flozi00)