|
--- |
|
license: bigscience-openrail-m |
|
widget: |
|
- text: M[MASK]LWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN |
|
datasets: |
|
- Ensembl |
|
pipeline_tag: fill-mask |
|
tags: |
|
- biology |
|
- medical |
|
--- |
|
|
|
# BERT base for proteins |
|
This is bidirectional transformer pretrained on amino-acid sequences of human proteins. |
|
|
|
Example: Insulin (P01308) |
|
``` |
|
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN |
|
``` |
|
|
|
The model was trained using the masked-language-modeling objective. |
|
|
|
## Intended uses |
|
This model is primarily aimed at being fine-tuned on the following tasks: |
|
- protein function |
|
- molecule-to-gene-expression mapping |
|
- cell targeting |
|
|
|
## How to use in your code |
|
```python |
|
from transformers import BertTokenizerFast, BertModel |
|
checkpoint = 'unikei/bert-base-proteins' |
|
tokenizer = BertTokenizerFast.from_pretrained(checkpoint) |
|
model = BertModel.from_pretrained(checkpoint) |
|
|
|
example = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN' |
|
tokens = tokenizer(example, return_tensors='pt') |
|
predictions = model(**tokens) |
|
``` |