hubert-base-korean

Model Details

HuBERT(Hidden-Unit BERT)๋Š” Facebook์—์„œ ์ œ์•ˆํ•œ Speech Representation Learning ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. HuBERT๋Š” ๊ธฐ์กด์˜ ์Œ์„ฑ ์ธ์‹ ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ, ์Œ์„ฑ ์‹ ํ˜ธ๋ฅผ raw waveform์—์„œ ๋ฐ”๋กœ ํ•™์Šตํ•˜๋Š” self-supervised learning ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ด ์—ฐ๊ตฌ๋Š” ๊ตฌ๊ธ€์˜ TPU Research Cloud(TRC)๋ฅผ ํ†ตํ•ด ์ง€์›๋ฐ›์€ Cloud TPU๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Model Description

Base Large
CNN Encoder strides 5, 2, 2, 2, 2, 2, 2
kernel width 10, 3, 3, 3, 3, 2, 2
channel 512
Transformer Encoder Layer 12 24
embedding dim 768 1024
inner FFN dim 3072 4096
attention heads 8 16
Projection dim 256 768
Params 95M 317M

How to Get Started with the Model

Pytorch

import torch
from transformers import HubertModel

model = HubertModel.from_pretrained("team-lucid/hubert-xlarge-korean")

wav = torch.ones(1, 16000)
outputs = model(wav)
print(f"Input:   {wav.shape}")  # [1, 16000]
print(f"Output:  {outputs.last_hidden_state.shape}")  # [1, 49, 768]

JAX/Flax

import jax.numpy as jnp
from transformers import FlaxAutoModel

model = FlaxAutoModel.from_pretrained("team-lucid/hubert-xlarge-korean", trust_remote_code=True)

wav = jnp.ones((1, 16000))
outputs = model(wav)
print(f"Input:   {wav.shape}")  # [1, 16000]
print(f"Output:  {outputs.last_hidden_state.shape}")  # [1, 49, 768]

Training Details

Training Data

ํ•ด๋‹น ๋ชจ๋ธ์€ ๊ณผํ•™๊ธฐ์ˆ ์ •๋ณดํ†ต์‹ ๋ถ€์˜ ์žฌ์›์œผ๋กœ ํ•œ๊ตญ์ง€๋Šฅ์ •๋ณด์‚ฌํšŒ์ง„ํฅ์›์˜ ์ง€์›์„ ๋ฐ›์•„ ๊ตฌ์ถ•๋œ ์ž์œ ๋Œ€ํ™” ์Œ์„ฑ(์ผ๋ฐ˜๋‚จ์—ฌ), ๋‹คํ™”์ž ์Œ์„ฑํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ, ๋ฐฉ์†ก ์ฝ˜ํ…์ธ  ๋Œ€ํ™”์ฒด ์Œ์„ฑ์ธ์‹ ๋ฐ์ดํ„ฐ ์—์„œ ์•ฝ 4,000์‹œ๊ฐ„์„ ์ถ”์ถœํ•ด ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Training Procedure

์› ๋…ผ๋ฌธ๊ณผ ๋™์ผํ•˜๊ฒŒ MFCC ๊ธฐ๋ฐ˜์œผ๋กœ Base ๋ชจ๋ธ์„ ํ•™์Šตํ•œ ๋‹ค์Œ, 500 cluster๋กœ k-means๋ฅผ ์ˆ˜ํ–‰ํ•ด ๋‹ค์‹œ Base์™€ Large ๋ชจ๋ธ์„ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค.

Training Hyperparameters

Hyperparameter Base Large
Warmup Steps 32,000 32,000
Learning Rates 5e-4 1.5e-3
Batch Size 128 128
Weight Decay 0.01 0.01
Max Steps 400,000 400,000
Learning Rate Decay 0.1 0.1
Adamฮฒ1Adam\beta_1 0.9 0.9
Adamฮฒ2Adam\beta_2 0.99 0.99
Downloads last month
60
Safetensors
Model size
962M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including team-lucid/hubert-xlarge-korean