Edit model card

ⓍTTS_v2 - The San-Ti Fine-Tuned Model

This repository hosts a fine-tuned version of the ⓍTTS model, utilizing 4 minutes of unique voice lines from The San-Ti, The voice lines were sourced from the clip of 3 Body Problem on Youtube, can be found here: The San-Ti Explain how they Stop Science on Earth | 3 Body Problem | Netflix

The San-Ti: Illustration Just the illustration, we never know their looks.

Listen to a sample of the ⓍTTS_v2 - The San-Ti Fine-Tuned Model:

Here's a The San-Ti mp3 voice line clip from the training data:

Features

  • 🎙️ Voice Cloning: Realistic voice cloning with just a short audio clip.
  • 🌍 Multi-Lingual Support: Generates speech in 17 different languages while maintaining The San-Ti's voice.
  • 😃 Emotion & Style Transfer: Captures the emotional tone and style of the original voice.
  • 🔄 Cross-Language Cloning: Maintains the unique voice characteristics across different languages.
  • 🎧 High-Quality Audio: Outputs at a 24kHz sampling rate for clear and high-fidelity audio.

Supported Languages

The model supports the following 17 languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko), and Hindi (hi).

Usage in Roll Cage

🤖💬 Boost your AI experience with this Ollama add-on! Enjoy real-time audio 🎙️ and text 🔍 chats, LaTeX rendering 📜, agent automations ⚙️, workflows 🔄, text-to-image 📝➡️🖼️, image-to-text 🖼️➡️🔤, image-to-video 🖼️➡️🎥 transformations. Fine-tune text 📝, voice 🗣️, and image 🖼️ gens. Includes Windows macro controls 🖥️ and DuckDuckGo search.

ollama_agent_roll_cage (OARC) is a completely local Python & CMD toolset add-on for the Ollama command line interface. The OARC toolset automates the creation of agents, giving the user more control over the likely output. It provides SYSTEM prompt templates for each ./Modelfile, allowing users to design and deploy custom agents quickly. Users can select which local model file is used in agent construction with the desired system prompt.

CoquiTTS and Resources

License

This model is licensed under the Coqui Public Model License. Read more about the origin story of CPML here.

Contact

Join our 🐸Community on Discord and follow us on Twitter. For inquiries, email us at info@coqui.ai.

Using 🐸TTS API:

from TTS.api import TTS

tts = TTS(model_path="D:/AI/ollama_agent_roll_cage/AgentFiles/Ignored_TTS/XTTS-v2_PeterDrury/", 
          config_path="D:/AI/ollama_agent_roll_cage/AgentFiles/Ignored_TTS/XTTS-v2_PeterDrury/config.json", progress_bar=False, gpu=True).to(self.device)

# generate speech by cloning a voice using default settings
tts.tts_to_file(text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
                file_path="output.wav",
                speaker_wav="/path/to/target/speaker.wav",
                language="en")

Using 🐸TTS Command line:

 tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
     --text "Bugün okula gitmek istemiyorum." \
     --speaker_wav /path/to/target/speaker.wav \
     --language_idx tr \
     --use_cuda true

Using the model directly:

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

config = XttsConfig()
config.load_json("/path/to/xtts/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
model.cuda()

outputs = model.synthesize(
    "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
    config,
    speaker_wav="/data/TTS-public/_refclips/3.wav",
    gpt_cond_len=3,
    language="en",
)
Downloads last month
9
Inference Examples
Inference API (serverless) does not yet support coqui models for this pipeline type.