Monarch Mixer-BERT

The 260M checkpoint for M2-BERT-large from the paper Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture.

Check out our GitHub for instructions on how to download and fine-tune it!

How to use

You can load this model using Hugging Face AutoModel:

from transformers import AutoModelForMaskedLM
mlm = AutoModelForMaskedLM.from_pretrained('alycialee/m2-bert-260M', trust_remote_code=True)

This model uses the Hugging Face bert-base-uncased tokenizer:

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

You can use this model with a pipeline for masked language modeling:

from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
mlm = AutoModelForMaskedLM.from_pretrained('alycialee/m2-bert-260M', trust_remote_code=True)

unmasker = pipeline('fill-mask', model=mlm, tokenizer=tokenizer)
unmasker('Every morning, I enjoy a cup of [MASK] to start my day.')

Remote Code

This model requires trust_remote_code=True to be passed to the from_pretrained method. This is because we use custom PyTorch code (see our GitHub). You should consider passing a revision argument that specifies the exact git commit of the code, for example:

mlm = AutoModelForMaskedLM.from_pretrained(
   'alycialee/m2-bert-260M',
   trust_remote_code=True,
   revision='e8d17ae',
)

Configuration

Note use_flash_mm is false by default. Using FlashMM is currently not supported. Using hyena_training_additions is turned off.