--- license: mit language: - en - fr - de - it - es - pt - pl - nl - ru pipeline_tag: token-classification inference: false tags: - mBERT - BERT - generic - entity-recognition --- ## Model The [multilingual BERT](https://huggingface.co/bert-base-multilingual-cased) finetunned on an artificially annotated multilingual subset of [Oscar dataset](https://huggingface.co/datasets/oscar-corpus/OSCAR-2201). This model provides domain & language independent embedding for Entity Recognition Task. ## Usage Embeddings can be used out of the box or fine-tuned on specific datasets. Get embeddings: ```python import torch import transformers model = transformers.AutoModel.from_pretrained( 'numind/entity-recognition-general-sota-v1', output_hidden_states=True, ) tokenizer = transformers.AutoTokenizer.from_pretrained( 'numind/entity-recognition-general-sota-v1', ) text = [ "NuMind is an AI company based in Paris and USA.", "See other models from us on https://huggingface.co/numind" ] encoded_input = tokenizer( text, return_tensors='pt', padding=True, truncation=True ) output = model(**encoded_input) # for better quality emb = torch.cat( (output.hidden_states[-1], output.hidden_states[-7]), dim=2 ) # for better speed # emb = output.hidden_states[-1] ```