unikei
/

bert-base-proteins

Inference Endpoints

Model card Files Files and versions Community

bert-base-proteins / README.md

unikei's picture

Update README.md

0bda497 verified 10 months ago

|

1.18 kB

	---
	license: bigscience-openrail-m
	widget:
	- text: M[MASK]LWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
	datasets:
	- Ensembl
	pipeline_tag: fill-mask
	tags:
	- biology
	- medical
	---

	# BERT base for proteins
	This is bidirectional transformer pretrained on amino-acid sequences of human proteins.

	Example: Insulin (P01308)
	```
	MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
	```

	The model was trained using the masked-language-modeling objective.

	## Intended uses
	This model is primarily aimed at being fine-tuned on the following tasks:
	- protein function
	- molecule-to-gene-expression mapping
	- cell targeting

	## How to use in your code
	```python
	from transformers import BertTokenizerFast, BertModel
	checkpoint = 'unikei/bert-base-proteins'
	tokenizer = BertTokenizerFast.from_pretrained(checkpoint)
	model = BertModel.from_pretrained(checkpoint)

	example = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN'
	tokens = tokenizer(example, return_tensors='pt')
	predictions = model(**tokens)
	```