Confusion about the use of the Encodec model

by xtluo - opened Sep 6, 2023

Sep 6, 2023

In your published paper, the Encodec model is used as the final acoustic teacher, but the pseudocode is:

y_VQ = embedding(x_acoustic_labels) 
z = MERT(x_noised)
loss_acoustic = Cross_Entropy(z[mask_idx], y_VQ[mask_idx])

So, my questions are:

Multimodal Art Projection org Sep 15, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment