Models
Generic model classes
NeuronTracedModel
The NeuronTracedModel
class is available for instantiating a base Neuron model without a specific head.
It is used as the base class for all tasks but text generation.
class optimum.neuron.NeuronTracedModel
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Base class running compiled and optimized models on Neuron devices.
It implements generic methods for interacting with the Hugging Face Hub as well as compiling vanilla
transformers models to neuron-optimized TorchScript module and export it using optimum.exporters.neuron
toolchain.
Class attributes:
- model_type (
str
, optional, defaults to"neuron_model"
) — The name of the model type to use when registering the NeuronTracedModel classes. - auto_model_class (
Type
, optional, defaults toAutoModel
) — TheAutoModel
class to be represented by the current NeuronTracedModel class.
Common attributes:
- model (
torch.jit._script.ScriptModule
) — The loadedScriptModule
compiled for neuron devices. - config (PretrainedConfig) — The configuration of the model.
- model_save_dir (
Path
) — The directory where a neuron compiled model is saved. By default, if the loaded model is local, the directory where the original model will be used. Otherwise, the cache directory will be used.
Returns whether this model can generate sequences with .generate()
.
Gets a dictionary of inputs with their valid static shapes.
load_model
< source >( path: typing.Union[str, pathlib.Path] to_neuron: bool = False device_id: int = 0 )
Parameters
- path (
Union[str, Path]
) — Path of the compiled model. - to_neuron (
bool
, defaults toFalse
) — Whether to move manually the traced model to NeuronCore. It’s only needed wheninline_weights_to_neff=False
, otherwise it is loaded automatically to a Neuron device. - device_id (
int
, defaults to 0) — Index of NeuronCore to load the traced model to.
Loads a TorchScript module compiled by neuron(x)-cc compiler. It will be first loaded onto CPU and then moved to one or multiple NeuronCore.
remove_padding
< source >( outputs: typing.List[torch.Tensor] dims: typing.List[int] indices: typing.List[int] padding_side: typing.Literal['right', 'left'] = 'right' )
Parameters
- outputs (
List[torch.Tensor]
) — List of torch tensors which are inference output. - dims (
List[int]
) — List of dimensions in which we slice a tensor. - indices (
List[int]
) — List of indices in which we slice a tensor along an axis. - padding_side (
Literal["right", "left"]
, defaults to “right”) — The side on which the padding has been applied.
Removes padding from output tensors.
NeuronDecoderModel
The NeuronDecoderModel
class is the base class for text generation models.
class optimum.neuron.NeuronDecoderModel
< source >( config: PretrainedConfig checkpoint_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory] compiled_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None generation_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None )
Base class to convert and run pre-trained transformers decoder models on Neuron devices.
It implements the methods to convert a pre-trained transformers decoder model into a Neuron transformer model by:
- transferring the checkpoint weights of the original into an optimized neuron graph,
- compiling the resulting graph using the Neuron compiler.
Common attributes:
- model (
torch.nn.Module
) — The decoder model with a graph optimized for neuron devices. - config (PretrainedConfig) — The configuration of the original model.
- generation_config (GenerationConfig) — The generation configuration used by default when calling
generate()
.
Natural Language Processing
The following Neuron model classes are available for natural language processing tasks.
NeuronModelForFeatureExtraction
class optimum.neuron.NeuronModelForFeatureExtraction
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with a BaseModelOutput for feature-extraction tasks.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Feature Extraction model on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForFeatureExtraction forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of feature extraction: (Following model is compiled with neuronx compiler and can only be run on INF2. Replace “neuronx” with “neuron” if you are using INF1.)
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForFeatureExtraction
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/all-MiniLM-L6-v2-neuronx")
>>> model = NeuronModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2-neuronx")
>>> inputs = tokenizer("Dear Evan Hansen is the winner of six Tony Awards.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> last_hidden_state = outputs.last_hidden_state
>>> list(last_hidden_state.shape)
[1, 13, 384]
NeuronModelForSentenceTransformers
class optimum.neuron.NeuronModelForSentenceTransformers
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model for Sentence Transformers.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Sentence Transformers model on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor pixel_values: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForSentenceTransformers forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of TEXT Sentence Transformers:
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForSentenceTransformers
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bge-base-en-v1.5-neuronx")
>>> model = NeuronModelForSentenceTransformers.from_pretrained("optimum/bge-base-en-v1.5-neuronx")
>>> inputs = tokenizer("In the smouldering promise of the fall of Troy, a mythical world of gods and mortals rises from the ashes.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> token_embeddings = outputs.token_embeddings
>>> sentence_embedding = = outputs.sentence_embedding
NeuronModelForMaskedLM
class optimum.neuron.NeuronModelForMaskedLM
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with a MaskedLMOutput for masked language modeling tasks.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Masked language model for on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForMaskedLM forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of fill mask: (Following model is compiled with neuronx compiler and can only be run on INF2. Replace “neuronx” with “neuron” if you are using INF1.)
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForMaskedLM
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/legal-bert-base-uncased-neuronx")
>>> model = NeuronModelForMaskedLM.from_pretrained("optimum/legal-bert-base-uncased-neuronx")
>>> inputs = tokenizer("This [MASK] Agreement is between General Motors and John Murray.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 13, 30522]
NeuronModelForSequenceClassification
class optimum.neuron.NeuronModelForSequenceClassification
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Sequence Classification model on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForSequenceClassification forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of single-label classification: (Following model is compiled with neuronx compiler and can only be run on INF2.)
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForSequenceClassification
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx")
>>> model = NeuronModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx")
>>> inputs = tokenizer("Hamilton is considered to be the best musical of human history.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 2]
NeuronModelForQuestionAnswering
class optimum.neuron.NeuronModelForQuestionAnswering
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with a QuestionAnsweringModelOutput for extractive question-answering tasks like SQuAD.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Question Answering model on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForQuestionAnswering forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of question answering: (Following model is compiled with neuronx compiler and can only be run on INF2.)
>>> import torch
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForQuestionAnswering
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2-neuronx")
>>> model = NeuronModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2-neuronx")
>>> question, text = "Are there wheelchair spaces in the theatres?", "Yes, we have reserved wheelchair spaces with a good view."
>>> inputs = tokenizer(question, text, return_tensors="pt")
>>> start_positions = torch.tensor([1])
>>> end_positions = torch.tensor([12])
>>> outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions)
>>> start_scores = outputs.start_logits
>>> end_scores = outputs.end_logits
NeuronModelForTokenClassification
class optimum.neuron.NeuronModelForTokenClassification
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Token Classification model on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForTokenClassification forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of token classification: (Following model is compiled with neuronx compiler and can only be run on INF2.)
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForTokenClassification
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-NER-neuronx")
>>> model = NeuronModelForTokenClassification.from_pretrained("optimum/bert-base-NER-neuronx")
>>> inputs = tokenizer("Lin-Manuel Miranda is an American songwriter, actor, singer, filmmaker, and playwright.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 20, 9]
NeuronModelForMultipleChoice
class optimum.neuron.NeuronModelForMultipleChoice
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Multiple choice model on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, num_choices, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, num_choices, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, num_choices, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForMultipleChoice forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of mutliple choice: (Following model is compiled with neuronx compiler and can only be run on INF2.)
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForMultipleChoice
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-uncased_SWAG-neuronx")
>>> model = NeuronModelForMultipleChoice.from_pretrained("optimum/bert-base-uncased_SWAG-neuronx")
>>> num_choices = 4
>>> first_sentence = ["Members of the procession walk down the street holding small horn brass instruments."] * num_choices
>>> second_sentence = [
... "A drum line passes by walking down the street playing their instruments.",
... "A drum line has heard approaching them.",
... "A drum line arrives and they're outside dancing and asleep.",
... "A drum line turns the lead singer watches the performance."
... ]
>>> inputs = tokenizer(first_sentence, second_sentence, truncation=True, padding=True)
# Unflatten the inputs values expanding it to the shape [batch_size, num_choices, seq_length]
>>> for k, v in inputs.items():
... inputs[k] = [v[i: i + num_choices] for i in range(0, len(v), num_choices)]
>>> inputs = dict(inputs.convert_to_tensors(tensor_type="pt"))
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> logits.shape
[1, 4]
NeuronModelForCausalLM
class optimum.neuron.NeuronModelForCausalLM
< source >( config: PretrainedConfig checkpoint_dir: typing.Union[str, ForwardRef('Path'), ForwardRef('TemporaryDirectory')] compiled_dir: typing.Union[str, ForwardRef('Path'), ForwardRef('TemporaryDirectory'), NoneType] = None generation_config: typing.Optional[ForwardRef('GenerationConfig')] = None )
Parameters
- model (
torch.nn.Module
) — torch.nn.Module is the neuron decoder graph. - config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. - model_path (
Path
) — The directory where the compiled artifacts for the model are stored. It can be a temporary directory if the model has never been saved locally before. - generation_config (
transformers.GenerationConfig
) — GenerationConfig holds the configuration for the model generation task.
Neuron model with a causal language modeling head for inference on Neuron devices.
This model inherits from ~neuron.modeling.NeuronDecoderModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
forward
< source >( input_ids: Tensor cache_ids: Tensor start_ids: Tensor = None return_dict: bool = True )
Parameters
- input_ids (
torch.LongTensor
) — Indices of decoder input sequence tokens in the vocabulary of shape(batch_size, sequence_length)
. - cache_ids (
torch.LongTensor
) — The indices at which the cached key and value for the current inputs need to be stored. - start_ids (
torch.LongTensor
) — The indices of the first tokens to be processed, deduced form the attention masks.
The NeuronModelForCausalLM forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of text generation:
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForCausalLM
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = NeuronModelForCausalLM.from_pretrained("gpt2", export=True)
>>> inputs = tokenizer("My favorite moment of the day is", return_tensors="pt")
>>> gen_tokens = model.generate(**inputs, do_sample=True, temperature=0.9, min_length=20, max_length=20)
>>> tokenizer.batch_decode(gen_tokens)
NeuronModelForSeq2SeqLM
class optimum.neuron.NeuronModelForSeq2SeqLM
< source >( encoder: ScriptModule decoder: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None encoder_file_name: typing.Optional[str] = 'model.neuron' decoder_file_name: typing.Optional[str] = 'model.neuron' preprocessors: typing.Optional[typing.List] = None neuron_configs: typing.Optional[typing.Dict[str, ForwardRef('NeuronDefaultConfig')]] = None configs: typing.Optional[typing.Dict[str, ForwardRef('PretrainedConfig')]] = None generation_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None **kwargs )
Parameters
- encoder (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module of the encoder with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler. - decoder (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module of the decoder with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler. - config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights.
Neuron Sequence-to-sequence model with a language modeling head for text2text-generation tasks.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
forward
< source >( attention_mask: typing.Optional[torch.FloatTensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.BoolTensor] = None encoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None beam_scores: typing.Optional[torch.FloatTensor] = None return_dict: bool = False output_attentions: bool = False output_hidden_states: bool = False )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
The NeuronModelForSeq2SeqLM forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
(Following models are compiled with neuronx compiler and can only be run on INF2.)
Example of text-to-text generation with small T5 model:
from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForSeq2SeqLM
neuron_model = NeuronModelForSeq2SeqLM.from_pretrained(google-t5/t5-small, export=True, dynamic_batch_size=False, batch_size=1, sequence_length=64, num_beams=4)
neuron_model.save_pretrained("t5_small_neuronx")
del neuron_model
neuron_model = NeuronModelForSeq2SeqLM.from_pretrained("t5_small_neuronx")
tokenizer = AutoTokenizer.from_pretrained("t5_small_neuronx")
inputs = tokenizer("translate English to German: Lets eat good food.", return_tensors="pt")
output = neuron_model.generate(
**inputs,
num_return_sequences=1,
)
results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]
(For large models, in order to fit into Neuron cores, we need to apply tensor parallelism. Here below is an example ran on inf2.24xlarge
.)
Example of text-to-text generation with tensor parallelism:
from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForSeq2SeqLM
# 1. compile
if __name__ == "__main__": # compulsory for parallel tracing since the API will spawn multiple processes.
neuron_model = NeuronModelForSeq2SeqLM.from_pretrained(
google/flan-t5-xl, export=True, tensor_parallel_size=8, dynamic_batch_size=False, batch_size=1, sequence_length=128, num_beams=4,
)
neuron_model.save_pretrained("flan_t5_xl_neuronx_tp8/")
del neuron_model
# 2. inference
neuron_model = NeuronModelForSeq2SeqLM.from_pretrained("flan_t5_xl_neuronx_tp8")
tokenizer = AutoTokenizer.from_pretrained("flan_t5_xl_neuronx_tp8")
inputs = tokenizer("translate English to German: Lets eat good food.", return_tensors="pt")
output = neuron_model.generate(
**inputs,
num_return_sequences=1,
)
results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]
Computer Vision
The following Neuron model classes are available for computer vision tasks.
NeuronModelForImageClassification
class optimum.neuron.NeuronModelForImageClassification
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with an image classification head on top (a linear layer on top of the final hidden state of the [CLS] token) e.g. for ImageNet.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Neuron Model for image-classification tasks. This class officially supports beit, convnext, convnextv2, deit, levit, mobilenet_v2, mobilevit, vit, etc.
forward
< source >( pixel_values: Tensor **kwargs )
Parameters
- pixel_values (
Union[torch.Tensor, None]
of shape(batch_size, num_channels, height, width)
, defaults toNone
) — Pixel values corresponding to the images in the current batch. Pixel values can be obtained from encoded images usingAutoImageProcessor
.
The NeuronModelForImageClassification forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of image classification:
>>> import requests
>>> from PIL import Image
>>> from optimum.neuron import NeuronModelForImageClassification
>>> from transformers import AutoImageProcessor
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> preprocessor = AutoImageProcessor.from_pretrained("optimum/vit-base-patch16-224-neuronx")
>>> model = NeuronModelForImageClassification.from_pretrained("optimum/vit-base-patch16-224-neuronx")
>>> inputs = preprocessor(images=image, return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> predicted_label = logits.argmax(-1).item()
Example using optimum.neuron.pipeline
:
>>> import requests
>>> from PIL import Image
>>> from transformers import AutoImageProcessor
>>> from optimum.neuron import NeuronModelForImageClassification, pipeline
>>> preprocessor = AutoImageProcessor.from_pretrained("optimum/vit-base-patch16-224-neuronx")
>>> model = NeuronModelForImageClassification.from_pretrained("optimum/vit-base-patch16-224-neuronx")
>>> pipe = pipeline("image-classification", model=model, feature_extractor=preprocessor)
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> pred = pipe(url)
NeuronModelForSemanticSegmentation
class optimum.neuron.NeuronModelForSemanticSegmentation
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with a semantic segmentation head on top, e.g. for Pascal VOC.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Neuron Model for semantic-segmentation, with an all-MLP decode head on top e.g. for ADE20k, CityScapes. This class officially supports mobilevit, mobilenet-v2, etc.
forward
< source >( pixel_values: Tensor **kwargs )
Parameters
- pixel_values (
Union[torch.Tensor, None]
of shape(batch_size, num_channels, height, width)
, defaults toNone
) — Pixel values corresponding to the images in the current batch. Pixel values can be obtained from encoded images usingAutoImageProcessor
.
The NeuronModelForSemanticSegmentation forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of semantic segmentation:
>>> import requests
>>> from PIL import Image
>>> from optimum.neuron import NeuronModelForSemanticSegmentation
>>> from transformers import AutoImageProcessor
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> preprocessor = AutoImageProcessor.from_pretrained("optimum/deeplabv3-mobilevit-small-neuronx")
>>> model = NeuronModelForSemanticSegmentation.from_pretrained("optimum/deeplabv3-mobilevit-small-neuronx")
>>> inputs = preprocessor(images=image, return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
Example using optimum.neuron.pipeline
:
>>> import requests
>>> from PIL import Image
>>> from transformers import AutoImageProcessor
>>> from optimum.neuron import NeuronModelForSemanticSegmentation, pipeline
>>> preprocessor = AutoImageProcessor.from_pretrained("optimum/deeplabv3-mobilevit-small-neuronx")
>>> model = NeuronModelForSemanticSegmentation.from_pretrained("optimum/deeplabv3-mobilevit-small-neuronx")
>>> pipe = pipeline("image-segmentation", model=model, feature_extractor=preprocessor)
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> pred = pipe(url)
NeuronModelForObjectDetection
class optimum.neuron.NeuronModelForObjectDetection
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with object detection heads on top, for tasks such as COCO detection.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Neuron Model for object-detection, with object detection heads on top, for tasks such as COCO detection.
forward
< source >( pixel_values: Tensor **kwargs )
Parameters
- pixel_values (
Union[torch.Tensor, None]
of shape(batch_size, num_channels, height, width)
, defaults toNone
) — Pixel values corresponding to the images in the current batch. Pixel values can be obtained from encoded images usingAutoImageProcessor
.
The NeuronModelForObjectDetection forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of object detection:
>>> import requests
>>> from PIL import Image
>>> from optimum.neuron import NeuronModelForObjectDetection
>>> from transformers import AutoImageProcessor
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> preprocessor = AutoImageProcessor.from_pretrained("hustvl/yolos-tiny")
>>> model = NeuronModelForObjectDetection.from_pretrained("hustvl/yolos-tiny", export=True, batch_size=1)
>>> inputs = preprocessor(images=image, return_tensors="pt")
>>> outputs = model(**inputs)
>>> target_sizes = torch.tensor([image.size[::-1]])
>>> results = image_processor.post_process_object_detection(outputs, threshold=0.9, target_sizes=target_sizes)[0]
Example using optimum.neuron.pipeline
:
>>> import requests
>>> from PIL import Image
>>> from transformers import AutoImageProcessor
>>> from optimum.neuron import NeuronModelForObjectDetection, pipeline
>>> preprocessor = AutoImageProcessor.from_pretrained("hustvl/yolos-tiny")
>>> model = NeuronModelForObjectDetection.from_pretrained("hustvl/yolos-tiny")
>>> pipe = pipeline("object-detection", model=model, feature_extractor=preprocessor)
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> pred = pipe(url)
Audio
The following auto classes are available for the following audio tasks.
NeuronModelForAudioClassification
class optimum.neuron.NeuronModelForAudioClassification
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with an audio classification head.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Neuron Model for audio-classification, with a sequence classification head on top (a linear layer over the pooled output) for tasks like SUPERB Keyword Spotting.
forward
< source >( input_values: Tensor **kwargs )
Parameters
- input_values (
torch.Tensor
of shape(batch_size, sequence_length)
) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array usingAutoProcessor
.
The NeuronModelForAudioClassification forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of audio classification:
>>> from transformers import AutoProcessor
>>> from optimum.neuron import NeuronModelForAudioClassification
>>> from datasets import load_dataset
>>> import torch
>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate
>>> feature_extractor = AutoProcessor.from_pretrained("Jingya/wav2vec2-large-960h-lv60-self-neuronx-audio-classification")
>>> model = NeuronModelForAudioClassification.from_pretrained("Jingya/wav2vec2-large-960h-lv60-self-neuronx-audio-classification")
>>> # audio file is decoded on the fly
>>> inputs = feature_extractor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
>>> logits = model(**inputs).logits
>>> predicted_class_ids = torch.argmax(logits, dim=-1).item()
>>> predicted_label = model.config.id2label[predicted_class_ids]
Example using optimum.neuron.pipeline
:
>>> from transformers import AutoProcessor
>>> from optimum.neuron import NeuronModelForAudioClassification, pipeline
>>> feature_extractor = AutoProcessor.from_pretrained("Jingya/wav2vec2-large-960h-lv60-self-neuronx-audio-classification")
>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> model = NeuronModelForAudioClassification.from_pretrained("Jingya/wav2vec2-large-960h-lv60-self-neuronx-audio-classification")
>>> ac = pipeline("audio-classification", model=model, feature_extractor=feature_extractor)
>>> pred = ac(dataset[0]["audio"]["array"])
NeuronModelForAudioFrameClassification
class optimum.neuron.NeuronModelForAudioFrameClassification
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with an audio frame classification head.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Neuron Model with a frame classification head on top for tasks like Speaker Diarization.
forward
< source >( input_values: Tensor **kwargs )
Parameters
- input_values (
torch.Tensor
of shape(batch_size, sequence_length)
) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array usingAutoProcessor
.
The NeuronModelForAudioFrameClassification forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of audio frame classification:
>>> from transformers import AutoProcessor
>>> from optimum.neuron import NeuronModelForAudioFrameClassification
>>> from datasets import load_dataset
>>> import torch
>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate
>>> feature_extractor = AutoProcessor.from_pretrained("Jingya/wav2vec2-base-superb-sd-neuronx")
>>> model = NeuronModelForAudioFrameClassification.from_pretrained("Jingya/wav2vec2-base-superb-sd-neuronx")
>>> inputs = feature_extractor(dataset[0]["audio"]["array"], return_tensors="pt", sampling_rate=sampling_rate)
>>> logits = model(**inputs).logits
>>> probabilities = torch.sigmoid(logits[0])
>>> labels = (probabilities > 0.5).long()
>>> labels[0].tolist()
NeuronModelForCTC
class optimum.neuron.NeuronModelForCTC
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with a connectionist temporal classification head.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Neuron Model with a language modeling head on top for Connectionist Temporal Classification (CTC).
forward
< source >( input_values: Tensor **kwargs )
Parameters
- input_values (
torch.Tensor
of shape(batch_size, sequence_length)
) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array usingAutoProcessor
.
The NeuronModelForCTC forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of CTC:
>>> from transformers import AutoProcessor, Wav2Vec2ForCTC
>>> from optimum.neuron import NeuronModelForCTC
>>> from datasets import load_dataset
>>> import torch
>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate
>>> processor = AutoProcessor.from_pretrained("Jingya/wav2vec2-large-960h-lv60-self-neuronx-ctc")
>>> model = NeuronModelForCTC.from_pretrained("Jingya/wav2vec2-large-960h-lv60-self-neuronx-ctc")
>>> # audio file is decoded on the fly
>>> inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
>>> logits = model(**inputs).logits
>>> predicted_ids = torch.argmax(logits, dim=-1)
>>> transcription = processor.batch_decode(predicted_ids)
Example using optimum.neuron.pipeline
:
>>> from transformers import AutoProcessor
>>> from optimum.neuron import NeuronModelForCTC, pipeline
>>> processor = AutoProcessor.from_pretrained("Jingya/wav2vec2-large-960h-lv60-self-neuronx-ctc")
>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> model = NeuronModelForCTC.from_pretrained("Jingya/wav2vec2-large-960h-lv60-self-neuronx-ctc")
>>> asr = pipeline("automatic-speech-recognition", model=model, feature_extractor=processor.feature_extractor, tokenizer=processor.tokenizer)
NeuronModelForXVector
class optimum.neuron.NeuronModelForXVector
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with an XVector feature extraction head on top for tasks like Speaker Verification.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Neuron Model with an XVector feature extraction head on top for tasks like Speaker Verification.
forward
< source >( input_values: Tensor **kwargs )
Parameters
- input_values (
torch.Tensor
of shape(batch_size, sequence_length)
) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array usingAutoProcessor
.
The NeuronModelForXVector forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of Audio XVector:
>>> from transformers import AutoProcessor
>>> from optimum.neuron import NeuronModelForXVector
>>> from datasets import load_dataset
>>> import torch
>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate
>>> feature_extractor = AutoProcessor.from_pretrained("Jingya/wav2vec2-base-superb-sv-neuronx")
>>> model = NeuronModelForXVector.from_pretrained("Jingya/wav2vec2-base-superb-sv-neuronx")
>>> inputs = feature_extractor(
... [d["array"] for d in dataset[:2]["audio"]], sampling_rate=sampling_rate, return_tensors="pt", padding=True
... )
>>> embeddings = model(**inputs).embeddings
>>> embeddings = torch.nn.functional.normalize(embeddings, dim=-1)
>>> cosine_sim = torch.nn.CosineSimilarity(dim=-1)
>>> similarity = cosine_sim(embeddings[0], embeddings[1])
>>> threshold = 0.7
>>> if similarity < threshold:
... print("Speakers are not the same!")
>>> round(similarity.item(), 2)
Stable Diffusion
The following Neuron model classes are available for stable diffusion tasks.
NeuronStableDiffusionPipeline
class optimum.neuron.NeuronStableDiffusionPipeline
< source >( config: typing.Dict[str, typing.Any] configs: typing.Dict[str, ForwardRef('PretrainedConfig')] neuron_configs: typing.Dict[str, ForwardRef('NeuronDefaultConfig')] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: typing.Optional[diffusers.schedulers.scheduling_utils.SchedulerMixin] vae_decoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeDecoder')] text_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None text_encoder_2: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None unet: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelUnet'), NoneType] = None transformer: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTransformer'), NoneType] = None vae_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeEncoder'), NoneType] = None image_encoder: typing.Optional[torch.jit._script.ScriptModule] = None safety_checker: typing.Optional[torch.jit._script.ScriptModule] = None tokenizer: typing.Union[transformers.models.clip.tokenization_clip.CLIPTokenizer, transformers.models.t5.tokenization_t5.T5Tokenizer, NoneType] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None controlnet: typing.Union[torch.jit._script.ScriptModule, typing.List[torch.jit._script.ScriptModule], ForwardRef('NeuronControlNetModel'), ForwardRef('NeuronMultiControlNetModel'), NoneType] = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_and_config_save_paths: typing.Optional[typing.Dict[str, typing.Tuple[str, pathlib.Path]]] = None )
NeuronStableDiffusionImg2ImgPipeline
class optimum.neuron.NeuronStableDiffusionImg2ImgPipeline
< source >( config: typing.Dict[str, typing.Any] configs: typing.Dict[str, ForwardRef('PretrainedConfig')] neuron_configs: typing.Dict[str, ForwardRef('NeuronDefaultConfig')] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: typing.Optional[diffusers.schedulers.scheduling_utils.SchedulerMixin] vae_decoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeDecoder')] text_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None text_encoder_2: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None unet: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelUnet'), NoneType] = None transformer: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTransformer'), NoneType] = None vae_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeEncoder'), NoneType] = None image_encoder: typing.Optional[torch.jit._script.ScriptModule] = None safety_checker: typing.Optional[torch.jit._script.ScriptModule] = None tokenizer: typing.Union[transformers.models.clip.tokenization_clip.CLIPTokenizer, transformers.models.t5.tokenization_t5.T5Tokenizer, NoneType] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None controlnet: typing.Union[torch.jit._script.ScriptModule, typing.List[torch.jit._script.ScriptModule], ForwardRef('NeuronControlNetModel'), ForwardRef('NeuronMultiControlNetModel'), NoneType] = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_and_config_save_paths: typing.Optional[typing.Dict[str, typing.Tuple[str, pathlib.Path]]] = None )
NeuronStableDiffusionInpaintPipeline
class optimum.neuron.NeuronStableDiffusionInpaintPipeline
< source >( config: typing.Dict[str, typing.Any] configs: typing.Dict[str, ForwardRef('PretrainedConfig')] neuron_configs: typing.Dict[str, ForwardRef('NeuronDefaultConfig')] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: typing.Optional[diffusers.schedulers.scheduling_utils.SchedulerMixin] vae_decoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeDecoder')] text_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None text_encoder_2: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None unet: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelUnet'), NoneType] = None transformer: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTransformer'), NoneType] = None vae_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeEncoder'), NoneType] = None image_encoder: typing.Optional[torch.jit._script.ScriptModule] = None safety_checker: typing.Optional[torch.jit._script.ScriptModule] = None tokenizer: typing.Union[transformers.models.clip.tokenization_clip.CLIPTokenizer, transformers.models.t5.tokenization_t5.T5Tokenizer, NoneType] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None controlnet: typing.Union[torch.jit._script.ScriptModule, typing.List[torch.jit._script.ScriptModule], ForwardRef('NeuronControlNetModel'), ForwardRef('NeuronMultiControlNetModel'), NoneType] = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_and_config_save_paths: typing.Optional[typing.Dict[str, typing.Tuple[str, pathlib.Path]]] = None )
NeuronLatentConsistencyModelPipeline
class optimum.neuron.NeuronLatentConsistencyModelPipeline
< source >( config: typing.Dict[str, typing.Any] configs: typing.Dict[str, ForwardRef('PretrainedConfig')] neuron_configs: typing.Dict[str, ForwardRef('NeuronDefaultConfig')] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: typing.Optional[diffusers.schedulers.scheduling_utils.SchedulerMixin] vae_decoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeDecoder')] text_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None text_encoder_2: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None unet: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelUnet'), NoneType] = None transformer: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTransformer'), NoneType] = None vae_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeEncoder'), NoneType] = None image_encoder: typing.Optional[torch.jit._script.ScriptModule] = None safety_checker: typing.Optional[torch.jit._script.ScriptModule] = None tokenizer: typing.Union[transformers.models.clip.tokenization_clip.CLIPTokenizer, transformers.models.t5.tokenization_t5.T5Tokenizer, NoneType] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None controlnet: typing.Union[torch.jit._script.ScriptModule, typing.List[torch.jit._script.ScriptModule], ForwardRef('NeuronControlNetModel'), ForwardRef('NeuronMultiControlNetModel'), NoneType] = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_and_config_save_paths: typing.Optional[typing.Dict[str, typing.Tuple[str, pathlib.Path]]] = None )
NeuronStableDiffusionControlNetPipeline
class optimum.neuron.NeuronStableDiffusionControlNetPipeline
< source >( config: typing.Dict[str, typing.Any] configs: typing.Dict[str, ForwardRef('PretrainedConfig')] neuron_configs: typing.Dict[str, ForwardRef('NeuronDefaultConfig')] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: typing.Optional[diffusers.schedulers.scheduling_utils.SchedulerMixin] vae_decoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeDecoder')] text_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None text_encoder_2: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None unet: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelUnet'), NoneType] = None transformer: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTransformer'), NoneType] = None vae_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeEncoder'), NoneType] = None image_encoder: typing.Optional[torch.jit._script.ScriptModule] = None safety_checker: typing.Optional[torch.jit._script.ScriptModule] = None tokenizer: typing.Union[transformers.models.clip.tokenization_clip.CLIPTokenizer, transformers.models.t5.tokenization_t5.T5Tokenizer, NoneType] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None controlnet: typing.Union[torch.jit._script.ScriptModule, typing.List[torch.jit._script.ScriptModule], ForwardRef('NeuronControlNetModel'), ForwardRef('NeuronMultiControlNetModel'), NoneType] = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_and_config_save_paths: typing.Optional[typing.Dict[str, typing.Tuple[str, pathlib.Path]]] = None )
__call__
< source >( prompt: typing.Union[str, typing.List[str]] = None image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] = None num_inference_steps: int = 50 timesteps: typing.Optional[typing.List[int]] = None sigmas: typing.Optional[typing.List[float]] = None guidance_scale: float = 7.5 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: typing.Optional[int] = 1 eta: float = 0.0 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None prompt_embeds: typing.Optional[torch.Tensor] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None ip_adapter_image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None ip_adapter_image_embeds: typing.Optional[typing.List[torch.Tensor]] = None output_type: str = 'pil' return_dict: bool = True cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None controlnet_conditioning_scale: typing.Union[float, typing.List[float]] = 1.0 guess_mode: bool = False control_guidance_start: typing.Union[float, typing.List[float]] = 0.0 control_guidance_end: typing.Union[float, typing.List[float]] = 1.0 clip_skip: typing.Optional[int] = None callback_on_step_end: typing.Union[typing.Callable[[int, int, typing.Dict], NoneType], diffusers.callbacks.PipelineCallback, diffusers.callbacks.MultiPipelineCallbacks, NoneType] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
Parameters
- prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide image generation. If not defined, you need to passprompt_embeds
. - image (
Optional["PipelineImageInput"]
, defaults toNone
) — The ControlNet input condition to provide guidance to theunet
for generation. If the type is specified astorch.Tensor
, it is passed to ControlNet as is.PIL.Image.Image
can also be accepted as an image. The dimensions of the output image defaults toimage
’s dimensions. If height and/or width are passed,image
is resized accordingly. If multiple ControlNets are specified ininit
, images must be passed as a list such that each element of the list can be correctly batched for input to a single ControlNet. Whenprompt
is a list, and if a list of images is passed for a single ControlNet, each will be paired with each prompt in theprompt
list. This also applies to multiple ControlNets, where a list of image lists can be passed to batch for each prompt and each ControlNet. - num_inference_steps (
int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. - timesteps (
Optional[List[int]]
, defaults toNone
) — Custom timesteps to use for the denoising process with schedulers which support atimesteps
argument in theirset_timesteps
method. If not defined, the default behavior whennum_inference_steps
is passed will be used. Must be in descending order. - sigmas (
Optional[List[int]]
, defaults toNone
) — Custom sigmas to use for the denoising process with schedulers which support asigmas
argument in theirset_timesteps
method. If not defined, the default behavior whennum_inference_steps
is passed will be used. - guidance_scale (
float
, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the textprompt
at the expense of lower image quality. Guidance scale is enabled whenguidance_scale > 1
. - negative_prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to passnegative_prompt_embeds
instead. Ignored when not using guidance (guidance_scale < 1
). - num_images_per_prompt (
int
, defaults to 1) — The number of images to generate per prompt. If it is different from the batch size used for the compiltaion, it will be overriden by the static batch size of neuron (except for dynamic batching). - eta (
float
, defaults to 0.0) — Corresponds to parameter eta (η) from the DDIM paper. Only applies to thediffusers.schedulers.DDIMScheduler
, and is ignored in other schedulers. - generator (
Optional[Union[torch.Generator, List[torch.Generator]]]
, defaults toNone
) — Atorch.Generator
to make generation deterministic. - latents (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied randomgenerator
. - prompt_embeds (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from theprompt
input argument. - negative_prompt_embeds (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided,negative_prompt_embeds
are generated from thenegative_prompt
input argument. - ip_adapter_image — (
Optional[PipelineImageInput]
, defaults toNone
): Optional image input to work with IP Adapters. - ip_adapter_image_embeds (
Optional[List[torch.Tensor]]
, defaults toNone
) — Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters. Each element should be a tensor of shape(batch_size, num_images, emb_dim)
. It should contain the negative image embedding ifdo_classifier_free_guidance
is set toTrue
. If not provided, embeddings are computed from theip_adapter_image
input argument. - output_type (
str
, defaults to"pil"
) — The output format of the generated image. Choose betweenPIL.Image
ornp.array
. - return_dict (
bool
, defaults toTrue
) — Whether or not to return adiffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
instead of a plain tuple. - cross_attention_kwargs (
Optional[Dict[str, Any]]
, defaults toNone
) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined inself.processor
. - controlnet_conditioning_scale (
Union[float, List[float]]
, defaults to 1.0) — The outputs of the ControlNet are multiplied bycontrolnet_conditioning_scale
before they are added to the residual in the originalunet
. If multiple ControlNets are specified ininit
, you can set the corresponding scale as a list. - guess_mode (
bool
, defaults toFalse
) — The ControlNet encoder tries to recognize the content of the input image even if you remove all prompts. Aguidance_scale
value between 3.0 and 5.0 is recommended. - control_guidance_start (
Union[float, List[float]]
, defaults to 0.0) — The percentage of total steps at which the ControlNet starts applying. - control_guidance_end (
Union[float, List[float]]
, optional, defaults to 1.0) — The percentage of total steps at which the ControlNet stops applying. - clip_skip (
Optional[int]
, defaults toNone
) — Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings. - callback_on_step_end (
Optional[Union[Callable[[int, int, Dict], None], PipelineCallback, MultiPipelineCallbacks]]
, defaults toNone
) — A function or a subclass ofPipelineCallback
orMultiPipelineCallbacks
that is called at the end of each denoising step during the inference. with the following arguments:callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)
.callback_kwargs
will include a list of all tensors as specified bycallback_on_step_end_tensor_inputs
. - callback_on_step_end_tensor_inputs (
List[str]
, defaults to["latents"]
) — The list of tensor inputs for thecallback_on_step_end
function. The tensors specified in the list will be passed ascallback_kwargs
argument. You will only be able to include variables listed in the._callback_tensor_inputs
attribute of your pipeline class.
Returns
diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
If return_dict
is True
, diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
is returned,
otherwise a tuple
is returned where the first element is a list with the generated images and the
second element is a list of bool
s indicating whether the corresponding generated image contains
“not-safe-for-work” (nsfw) content.
The call function to the pipeline for generation.
NeuronStableDiffusionXLPipeline
class optimum.neuron.NeuronStableDiffusionXLPipeline
< source >( config: typing.Dict[str, typing.Any] configs: typing.Dict[str, ForwardRef('PretrainedConfig')] neuron_configs: typing.Dict[str, ForwardRef('NeuronDefaultConfig')] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: typing.Optional[diffusers.schedulers.scheduling_utils.SchedulerMixin] vae_decoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeDecoder')] text_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None text_encoder_2: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None unet: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelUnet'), NoneType] = None transformer: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTransformer'), NoneType] = None vae_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeEncoder'), NoneType] = None image_encoder: typing.Optional[torch.jit._script.ScriptModule] = None safety_checker: typing.Optional[torch.jit._script.ScriptModule] = None tokenizer: typing.Union[transformers.models.clip.tokenization_clip.CLIPTokenizer, transformers.models.t5.tokenization_t5.T5Tokenizer, NoneType] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None controlnet: typing.Union[torch.jit._script.ScriptModule, typing.List[torch.jit._script.ScriptModule], ForwardRef('NeuronControlNetModel'), ForwardRef('NeuronMultiControlNetModel'), NoneType] = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_and_config_save_paths: typing.Optional[typing.Dict[str, typing.Tuple[str, pathlib.Path]]] = None )
NeuronStableDiffusionXLImg2ImgPipeline
class optimum.neuron.NeuronStableDiffusionXLImg2ImgPipeline
< source >( config: typing.Dict[str, typing.Any] configs: typing.Dict[str, ForwardRef('PretrainedConfig')] neuron_configs: typing.Dict[str, ForwardRef('NeuronDefaultConfig')] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: typing.Optional[diffusers.schedulers.scheduling_utils.SchedulerMixin] vae_decoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeDecoder')] text_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None text_encoder_2: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None unet: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelUnet'), NoneType] = None transformer: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTransformer'), NoneType] = None vae_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeEncoder'), NoneType] = None image_encoder: typing.Optional[torch.jit._script.ScriptModule] = None safety_checker: typing.Optional[torch.jit._script.ScriptModule] = None tokenizer: typing.Union[transformers.models.clip.tokenization_clip.CLIPTokenizer, transformers.models.t5.tokenization_t5.T5Tokenizer, NoneType] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None controlnet: typing.Union[torch.jit._script.ScriptModule, typing.List[torch.jit._script.ScriptModule], ForwardRef('NeuronControlNetModel'), ForwardRef('NeuronMultiControlNetModel'), NoneType] = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_and_config_save_paths: typing.Optional[typing.Dict[str, typing.Tuple[str, pathlib.Path]]] = None )
NeuronStableDiffusionXLInpaintPipeline
class optimum.neuron.NeuronStableDiffusionXLInpaintPipeline
< source >( config: typing.Dict[str, typing.Any] configs: typing.Dict[str, ForwardRef('PretrainedConfig')] neuron_configs: typing.Dict[str, ForwardRef('NeuronDefaultConfig')] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: typing.Optional[diffusers.schedulers.scheduling_utils.SchedulerMixin] vae_decoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeDecoder')] text_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None text_encoder_2: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None unet: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelUnet'), NoneType] = None transformer: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTransformer'), NoneType] = None vae_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeEncoder'), NoneType] = None image_encoder: typing.Optional[torch.jit._script.ScriptModule] = None safety_checker: typing.Optional[torch.jit._script.ScriptModule] = None tokenizer: typing.Union[transformers.models.clip.tokenization_clip.CLIPTokenizer, transformers.models.t5.tokenization_t5.T5Tokenizer, NoneType] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None controlnet: typing.Union[torch.jit._script.ScriptModule, typing.List[torch.jit._script.ScriptModule], ForwardRef('NeuronControlNetModel'), ForwardRef('NeuronMultiControlNetModel'), NoneType] = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_and_config_save_paths: typing.Optional[typing.Dict[str, typing.Tuple[str, pathlib.Path]]] = None )
NeuronStableDiffusionXLControlNetPipeline
class optimum.neuron.NeuronStableDiffusionXLControlNetPipeline
< source >( config: typing.Dict[str, typing.Any] configs: typing.Dict[str, ForwardRef('PretrainedConfig')] neuron_configs: typing.Dict[str, ForwardRef('NeuronDefaultConfig')] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: typing.Optional[diffusers.schedulers.scheduling_utils.SchedulerMixin] vae_decoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeDecoder')] text_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None text_encoder_2: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None unet: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelUnet'), NoneType] = None transformer: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTransformer'), NoneType] = None vae_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeEncoder'), NoneType] = None image_encoder: typing.Optional[torch.jit._script.ScriptModule] = None safety_checker: typing.Optional[torch.jit._script.ScriptModule] = None tokenizer: typing.Union[transformers.models.clip.tokenization_clip.CLIPTokenizer, transformers.models.t5.tokenization_t5.T5Tokenizer, NoneType] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None controlnet: typing.Union[torch.jit._script.ScriptModule, typing.List[torch.jit._script.ScriptModule], ForwardRef('NeuronControlNetModel'), ForwardRef('NeuronMultiControlNetModel'), NoneType] = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_and_config_save_paths: typing.Optional[typing.Dict[str, typing.Tuple[str, pathlib.Path]]] = None )
__call__
< source >( prompt: typing.Union[str, typing.List[str], NoneType] = None prompt_2: typing.Union[str, typing.List[str], NoneType] = None image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] = None num_inference_steps: int = 50 timesteps: typing.List[int] = None sigmas: typing.List[float] = None denoising_end: typing.Optional[float] = None guidance_scale: float = 5.0 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None negative_prompt_2: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: typing.Optional[int] = 1 eta: float = 0.0 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None prompt_embeds: typing.Optional[torch.Tensor] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None pooled_prompt_embeds: typing.Optional[torch.Tensor] = None negative_pooled_prompt_embeds: typing.Optional[torch.Tensor] = None ip_adapter_image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None ip_adapter_image_embeds: typing.Optional[typing.List[torch.Tensor]] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None controlnet_conditioning_scale: typing.Union[float, typing.List[float]] = 1.0 guess_mode: bool = False control_guidance_start: typing.Union[float, typing.List[float]] = 0.0 control_guidance_end: typing.Union[float, typing.List[float]] = 1.0 original_size: typing.Optional[typing.Tuple[int, int]] = None crops_coords_top_left: typing.Tuple[int, int] = (0, 0) target_size: typing.Optional[typing.Tuple[int, int]] = None negative_original_size: typing.Optional[typing.Tuple[int, int]] = None negative_crops_coords_top_left: typing.Tuple[int, int] = (0, 0) negative_target_size: typing.Optional[typing.Tuple[int, int]] = None clip_skip: typing.Optional[int] = None callback_on_step_end: typing.Union[typing.Callable[[int, int, typing.Dict], NoneType], diffusers.callbacks.PipelineCallback, diffusers.callbacks.MultiPipelineCallbacks, NoneType] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
Parameters
- prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide image generation. If not defined, you need to passprompt_embeds
. - prompt_2 (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to be sent totokenizer_2
andtext_encoder_2
. If not defined,prompt
is used in both text-encoders. - image (
Optional["PipelineImageInput"]
, defaults toNone
) — The ControlNet input condition to provide guidance to theunet
for generation. If the type is specified astorch.Tensor
, it is passed to ControlNet as is.PIL.Image.Image
can also be accepted as an image. The dimensions of the output image defaults toimage
’s dimensions. If height and/or width are passed,image
is resized accordingly. If multiple ControlNets are specified ininit
, images must be passed as a list such that each element of the list can be correctly batched for input to a single ControlNet. - num_inference_steps (
int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. - timesteps (
Optional[List[int]]
, defaults toNone
) — Custom timesteps to use for the denoising process with schedulers which support atimesteps
argument in theirset_timesteps
method. If not defined, the default behavior whennum_inference_steps
is passed will be used. Must be in descending order. - sigmas (
Optional[List[int]]
, defaults toNone
) — Custom sigmas to use for the denoising process with schedulers which support asigmas
argument in theirset_timesteps
method. If not defined, the default behavior whennum_inference_steps
is passed will be used. - denoising_end (
Optional[float]
, defaults toNone
) — When specified, determines the fraction (between 0.0 and 1.0) of the total denoising process to be completed before it is intentionally prematurely terminated. As a result, the returned sample will still retain a substantial amount of noise as determined by the discrete timesteps selected by the scheduler. The denoising_end parameter should ideally be utilized when this pipeline forms a part of a “Mixture of Denoisers” multi-pipeline setup, as elaborated in Refining the Image Output - guidance_scale (
float
, defaults to 5.0) — A higher guidance scale value encourages the model to generate images closely linked to the textprompt
at the expense of lower image quality. Guidance scale is enabled whenguidance_scale > 1
. - negative_prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to passnegative_prompt_embeds
instead. Ignored when not using guidance (guidance_scale < 1
). - negative_prompt_2 (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide what to not include in image generation. This is sent totokenizer_2
andtext_encoder_2
. If not defined,negative_prompt
is used in both text-encoders. - num_images_per_prompt (
int
, defaults to 1) — The number of images to generate per prompt. - eta (
float
, defaults to 0.0) — Corresponds to parameter eta (η) from the DDIM paper. Only applies to thediffusers.schedulers.DDIMScheduler
, and is ignored in other schedulers. - generator (
Optional[Union[torch.Generator, List[torch.Generator]]]
, defaults toNone
) — Atorch.Generator
to make generation deterministic. - latents (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied randomgenerator
. - prompt_embeds (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from theprompt
input argument. - negative_prompt_embeds (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided,negative_prompt_embeds
are generated from thenegative_prompt
input argument. - pooled_prompt_embeds (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated pooled text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, pooled text embeddings are generated fromprompt
input argument. - negative_pooled_prompt_embeds (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, poolednegative_prompt_embeds
are generated fromnegative_prompt
input argument. - ip_adapter_image — (
Optional[PipelineImageInput]
, defaults toNone
): Optional image input to work with IP Adapters. - ip_adapter_image_embeds (
Optional[List[torch.Tensor]]
, defaults toNone
) — Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters. Each element should be a tensor of shape(batch_size, num_images, emb_dim)
. It should contain the negative image embedding ifdo_classifier_free_guidance
is set toTrue
. If not provided, embeddings are computed from theip_adapter_image
input argument. - output_type (
Optional[str]
, defaults to"pil"
) — The output format of the generated image. Choose betweenPIL.Image
ornp.array
. - return_dict (
bool
, defaults toTrue
) — Whether or not to return a~pipelines.stable_diffusion.StableDiffusionPipelineOutput
instead of a plain tuple. - cross_attention_kwargs (
Optional[Dict[str, Any]]
, defaults toNone
) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined inself.processor
. - controlnet_conditioning_scale (
Union[float, List[float]]
, defaults to 1.0) — The outputs of the ControlNet are multiplied bycontrolnet_conditioning_scale
before they are added to the residual in the originalunet
. If multiple ControlNets are specified ininit
, you can set the corresponding scale as a list. - guess_mode (
bool
, defaults toFalse
) — The ControlNet encoder tries to recognize the content of the input image even if you remove all prompts. Aguidance_scale
value between 3.0 and 5.0 is recommended. - control_guidance_start (
Union[float, List[float]]
, defaults to 0.0) — The percentage of total steps at which the ControlNet starts applying. - control_guidance_end (
Union[float, List[float]]
, defaults to 1.0) — The percentage of total steps at which the ControlNet stops applying. - original_size (
Optional[Tuple[int, int]]
, defaults to (1024, 1024)) — Iforiginal_size
is not the same astarget_size
the image will appear to be down- or upsampled.original_size
defaults to(height, width)
if not specified. Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. - crops_coords_top_left (
Tuple[int, int]
, defaults to (0, 0)) —crops_coords_top_left
can be used to generate an image that appears to be “cropped” from the positioncrops_coords_top_left
downwards. Favorable, well-centered images are usually achieved by settingcrops_coords_top_left
to (0, 0). Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. - target_size (
Optional[Tuple[int, int]]
, defaults toNone
) — For most cases,target_size
should be set to the desired height and width of the generated image. If not specified it will default to(height, width)
. Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. - negative_original_size (
Optional[Tuple[int, int]]
, defaults toNone
) — To negatively condition the generation process based on a specific image resolution. Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. For more information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208. - negative_crops_coords_top_left (
Tuple[int, int]
, defaults to (0, 0)) — To negatively condition the generation process based on a specific crop coordinates. Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. For more information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208. - negative_target_size (
Optional[Tuple[int, int]]
, defaults toNone
) — To negatively condition the generation process based on a target image resolution. It should be as same as thetarget_size
for most cases. Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. For more information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208. - clip_skip (
Optional[int]
, defaults toNone
) — Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings. - callback_on_step_end (
Optional[Union[Callable[[int, int, Dict], None], PipelineCallback, MultiPipelineCallbacks]]
, defaults toNone
) — A function or a subclass ofPipelineCallback
orMultiPipelineCallbacks
that is called at the end of each denoising step during the inference. with the following arguments:callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)
.callback_kwargs
will include a list of all tensors as specified bycallback_on_step_end_tensor_inputs
. - callback_on_step_end_tensor_inputs (
List[str]
, defaults to["latents"]
) — The list of tensor inputs for thecallback_on_step_end
function. The tensors specified in the list will be passed ascallback_kwargs
argument. You will only be able to include variables listed in the._callback_tensor_inputs
attribute of your pipeline class.
Returns
diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
If return_dict
is True
, diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
is returned,
otherwise a tuple
is returned containing the output images.
The call function to the pipeline for generation.
Examples: