vilsonrodrigues
/

falcon-7b-sharded

@@ -26,6 +26,8 @@ Resharded version of https://huggingface.co/tiiuae/falcon-7b for low RAM envirom
 * **It features an architecture optimized for inference**, with FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)) and multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)).
 * **It is made available under a permissive Apache 2.0 license allowing for commercial use**, without any royalties or restrictions.
 ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.** If you are looking for a version better suited to taking generic instructions in a chat format, we recommend taking a look at [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct).
 🔥 **Looking for an even more powerful model?** [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) is Falcon-7B's big brother!
@@ -34,16 +36,13 @@ Resharded version of https://huggingface.co/tiiuae/falcon-7b for low RAM envirom
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import transformers
 import torch
 model = "tiiuae/falcon-7b"
 tokenizer = AutoTokenizer.from_pretrained(model)
 pipeline = transformers.pipeline(
     "text-generation",
     model=model,
     tokenizer=tokenizer,
     torch_dtype=torch.bfloat16,
-    trust_remote_code=True,
     device_map="auto",
 )
 sequences = pipeline(
@@ -56,7 +55,6 @@ sequences = pipeline(
 )
 for seq in sequences:
     print(f"Result: {seq['generated_text']}")
 ```
 💥 **Falcon LLMs require PyTorch 2.0 for use with `transformers`!**
@@ -105,16 +103,13 @@ We recommend users of Falcon-7B to consider finetuning it for the specific set o
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import transformers
 import torch
 model = "tiiuae/falcon-7b"
 tokenizer = AutoTokenizer.from_pretrained(model)
 pipeline = transformers.pipeline(
     "text-generation",
     model=model,
     tokenizer=tokenizer,
     torch_dtype=torch.bfloat16,
-    trust_remote_code=True,
     device_map="auto",
 )
 sequences = pipeline(
@@ -127,7 +122,6 @@ sequences = pipeline(
 )
 for seq in sequences:
     print(f"Result: {seq['generated_text']}")
 ```
 ## Training Details

 * **It features an architecture optimized for inference**, with FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)) and multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)).
 * **It is made available under a permissive Apache 2.0 license allowing for commercial use**, without any royalties or restrictions.
+⚠️ Falcon is now available as a core model in the `transformers` library! To use the in-library version, please install the latest version of `transformers` with `pip install git+https://github.com/ huggingface/transformers.git`, then simply remove the `trust_remote_code=True` argument from `from_pretrained()`.
 ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.** If you are looking for a version better suited to taking generic instructions in a chat format, we recommend taking a look at [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct).
 🔥 **Looking for an even more powerful model?** [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) is Falcon-7B's big brother!
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import transformers
 import torch
 model = "tiiuae/falcon-7b"
 tokenizer = AutoTokenizer.from_pretrained(model)
 pipeline = transformers.pipeline(
     "text-generation",
     model=model,
     tokenizer=tokenizer,
     torch_dtype=torch.bfloat16,
     device_map="auto",
 )
 sequences = pipeline(
 )
 for seq in sequences:
     print(f"Result: {seq['generated_text']}")
 ```
 💥 **Falcon LLMs require PyTorch 2.0 for use with `transformers`!**
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import transformers
 import torch
 model = "tiiuae/falcon-7b"
 tokenizer = AutoTokenizer.from_pretrained(model)
 pipeline = transformers.pipeline(
     "text-generation",
     model=model,
     tokenizer=tokenizer,
     torch_dtype=torch.bfloat16,
     device_map="auto",
 )
 sequences = pipeline(
 )
 for seq in sequences:
     print(f"Result: {seq['generated_text']}")
 ```
 ## Training Details