MLP intermediate dimension
#3
by
shantanuagarwal
- opened
Thanks for the great work.
Can you please specify the details of the MLP layer. It is mentioned in the paper that "MLP consists of two linear transformations with a GELU activation in between". Is the MLP size:
What I am unsure about is the value of the intermediate_dim in the following pseudo-code for mlp:
import torch
intermediate_dim = 4096 # ???
mlp = torch.nn.Sequential(
torch.nn.Linear(4096, intermediate_dim),
torch.nn.GELU(),
torch.nn.Linear(intermediate_dim, 4096),
)
Is the above pseudo-code similar to what was used in the expts?
Sorry if this detail is mentioned in the paper and I missed it.
Thanks.
Check this file modeling_nvembed.py
Thanks
@jootanehorror
.
For anyone else looking into this, see the class FeedForward
in https://huggingface.co/nvidia/NV-Embed-v1/blob/main/modeling_nvembed.py#L244.
Specifically, the intermediate dim is 4 * 4096
.
shantanuagarwal
changed discussion status to
closed