MLP intermediate dimension

#3
by shantanuagarwal - opened

Thanks for the great work.
Can you please specify the details of the MLP layer. It is mentioned in the paper that "MLP consists of two linear transformations with a GELU activation in between". Is the MLP size:

What I am unsure about is the value of the intermediate_dim in the following pseudo-code for mlp:

import torch

intermediate_dim = 4096  # ??? 
mlp = torch.nn.Sequential(
    torch.nn.Linear(4096, intermediate_dim),
    torch.nn.GELU(),
    torch.nn.Linear(intermediate_dim, 4096),
)

Is the above pseudo-code similar to what was used in the expts?

Sorry if this detail is mentioned in the paper and I missed it.

Thanks.

Check this file modeling_nvembed.py

Thanks @jootanehorror .
For anyone else looking into this, see the class FeedForward in https://huggingface.co/nvidia/NV-Embed-v1/blob/main/modeling_nvembed.py#L244.
Specifically, the intermediate dim is 4 * 4096.

shantanuagarwal changed discussion status to closed

Sign up or log in to comment