Ransaka/mbart-large-cc25-8bit

About

This is the 8-bit quantized version of Facebook's mbart model.

According to the abstract, MBART is a sequence-to-sequence denoising auto-encoder pretrained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pretraining a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text.

The Authors’ code can be found here

Usage info

Install requred packages

!pip install -U bitsandbytes sentencepiece

then import model from 🤗 transformers library

from transformers import MBartTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("Ransaka/mbart-large-cc25-8bit")
model = AutoModelForSeq2SeqLM.from_pretrained("Ransaka/mbart-large-cc25-8bit", device_map='auto')

# you'll get an output like this if import succeed
# ===================================BUG REPORT===================================
# Welcome to bitsandbytes. For bug reports, please run

# python -m bitsandbytes

#  and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
# ================================================================================
# bin /opt/conda/lib/python3.7/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
# CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
# CUDA SETUP: Highest compute capability among GPUs detected: 6.0
# CUDA SETUP: Detected CUDA version 113
# CUDA SETUP: Loading binary /opt/conda/lib/python3.7/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...

#create summarization pipeline
text = """Right now, major tech firms are clamouring to replicate the runaway success of ChatGPT,
          the generative AI chatbot developed by OpenAI using its GPT-3 large language model.
          Much like potential game-changers of the past, such as cloud-based Software as a Service
          (SaaS) platforms or blockchain technology (emphasis on potential), established companies
          and start-ups alike are going public with LLMs and ChatGPT alternatives in fear of being left behind.
      """
pipe = pipeline('text2text-generation', model=model, tokenizer=tokenizer)
pipe(text)
#[{'generated_text': 'theore, major tech are clamouring to replicate the generative AI chatbot developed by OpenAI using its AI'}]

print("Model memory usage: {:.2f} MB".format(pipe.model.get_memory_footprint()/1e6))
# 'Model memory usage: 1893.99 MB'