Quickstart with Python
AutoTrain is a library that allows you to train state of the art models on Hugging Face Spaces, or locally. It provides a simple and easy-to-use interface to train models for various tasks like llm finetuning, text classification, image classification, object detection, and more.
In this quickstart guide, we will show you how to train a model using AutoTrain in Python.
Getting Started
AutoTrain can be installed using pip:
$ pip install autotrain-advanced
The example code below shows how to finetune an LLM model using AutoTrain in Python:
import os
from autotrain.params import LLMTrainingParams
from autotrain.project import AutoTrainProject
params = LLMTrainingParams(
model="meta-llama/Llama-3.2-1B-Instruct",
data_path="HuggingFaceH4/no_robots",
chat_template="tokenizer",
text_column="messages",
train_split="train",
trainer="sft",
epochs=3,
batch_size=1,
lr=1e-5,
peft=True,
quantization="int4",
target_modules="all-linear",
padding="right",
optimizer="paged_adamw_8bit",
scheduler="cosine",
gradient_accumulation=8,
mixed_precision="bf16",
merge_adapter=True,
project_name="autotrain-llama32-1b-finetune",
log="tensorboard",
push_to_hub=True,
username=os.environ.get("HF_USERNAME"),
token=os.environ.get("HF_TOKEN"),
)
backend = "local"
project = AutoTrainProject(params=params, backend=backend, process=True)
project.create()
In this example, we are finetuning the meta-llama/Llama-3.2-1B-Instruct
model on the HuggingFaceH4/no_robots
dataset.
We are training the model for 3 epochs with a batch size of 1 and a learning rate of 1e-5
.
We are using the paged_adamw_8bit
optimizer and the cosine
scheduler.
We are also using mixed precision training with a gradient accumulation of 8.
The final model will be pushed to the Hugging Face Hub after training.
To train the model, run the following command:
$ export HF_USERNAME=<your-hf-username>
$ export HF_TOKEN=<your-hf-write-token>
$ python train.py
This will create a new project directory with the name autotrain-llama32-1b-finetune
and start the training process.
Once the training is complete, the model will be pushed to the Hugging Face Hub.
Your HF_TOKEN and HF_USERNAME are only required if you want to push the model or if you are accessing a gated model or dataset.
AutoTrainProject Class
class autotrain.project.AutoTrainProject
< source >( params: typing.Union[autotrain.trainers.clm.params.LLMTrainingParams, autotrain.trainers.text_classification.params.TextClassificationParams, autotrain.trainers.tabular.params.TabularParams, autotrain.trainers.dreambooth.params.DreamBoothTrainingParams, autotrain.trainers.seq2seq.params.Seq2SeqParams, autotrain.trainers.image_classification.params.ImageClassificationParams, autotrain.trainers.text_regression.params.TextRegressionParams, autotrain.trainers.object_detection.params.ObjectDetectionParams, autotrain.trainers.token_classification.params.TokenClassificationParams, autotrain.trainers.sent_transformers.params.SentenceTransformersParams, autotrain.trainers.image_regression.params.ImageRegressionParams, autotrain.trainers.extractive_question_answering.params.ExtractiveQuestionAnsweringParams, autotrain.trainers.vlm.params.VLMTrainingParams] backend: str process: bool = False )
A class to train an AutoTrain project
Attributes
params : Union[ LLMTrainingParams, TextClassificationParams, TabularParams, DreamBoothTrainingParams, Seq2SeqParams, ImageClassificationParams, TextRegressionParams, ObjectDetectionParams, TokenClassificationParams, SentenceTransformersParams, ImageRegressionParams, ExtractiveQuestionAnsweringParams, VLMTrainingParams, ] The parameters for the AutoTrain project. backend : str The backend to be used for the AutoTrain project. It should be one of the following:
- local
- spaces-a10g-large
- spaces-a10g-small
- spaces-a100-large
- spaces-t4-medium
- spaces-t4-small
- spaces-cpu-upgrade
- spaces-cpu-basic
- spaces-l4x1
- spaces-l4x4
- spaces-l40sx1
- spaces-l40sx4
- spaces-l40sx8
- spaces-a10g-largex2
- spaces-a10g-largex4 process : bool Flag to indicate if the params and dataset should be processed. If your data format is not AutoTrain-readable, set it to True. Set it to True when in doubt. Defaults to False.
Methods
post_init(): Validates the backend attribute. create(): Creates a runner based on the backend and initializes the AutoTrain project.
Parameters
Text Tasks
class autotrain.trainers.clm.params.LLMTrainingParams
< source >( model: str = 'gpt2' project_name: str = 'project-name' data_path: str = 'data' train_split: str = 'train' valid_split: typing.Optional[str] = None add_eos_token: bool = True block_size: typing.Union[int, typing.List[int]] = -1 model_max_length: int = 2048 padding: typing.Optional[str] = 'right' trainer: str = 'default' use_flash_attention_2: bool = False log: str = 'none' disable_gradient_checkpointing: bool = False logging_steps: int = -1 eval_strategy: str = 'epoch' save_total_limit: int = 1 auto_find_batch_size: bool = False mixed_precision: typing.Optional[str] = None lr: float = 3e-05 epochs: int = 1 batch_size: int = 2 warmup_ratio: float = 0.1 gradient_accumulation: int = 4 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 chat_template: typing.Optional[str] = None quantization: typing.Optional[str] = 'int4' target_modules: typing.Optional[str] = 'all-linear' merge_adapter: bool = False peft: bool = False lora_r: int = 16 lora_alpha: int = 32 lora_dropout: float = 0.05 model_ref: typing.Optional[str] = None dpo_beta: float = 0.1 max_prompt_length: int = 128 max_completion_length: typing.Optional[int] = None prompt_text_column: typing.Optional[str] = None text_column: str = 'text' rejected_text_column: typing.Optional[str] = None push_to_hub: bool = False username: typing.Optional[str] = None token: typing.Optional[str] = None unsloth: bool = False distributed_backend: typing.Optional[str] = None )
Parameters
- model (str) — Model name to be used for training. Default is “gpt2”.
- project_name (str) — Name of the project and output directory. Default is “project-name”.
- data_path (str) — Path to the dataset. Default is “data”.
- train_split (str) — Configuration for the training data split. Default is “train”.
- valid_split (Optional[str]) — Configuration for the validation data split. Default is None.
- add_eos_token (bool) — Whether to add an EOS token at the end of sequences. Default is True.
- block_size (Union[int, List[int]]) — Size of the blocks for training, can be a single integer or a list of integers. Default is -1.
- model_max_length (int) — Maximum length of the model input. Default is 2048.
- padding (Optional[str]) — Side on which to pad sequences (left or right). Default is “right”.
- trainer (str) — Type of trainer to use. Default is “default”.
- use_flash_attention_2 (bool) — Whether to use flash attention version 2. Default is False.
- log (str) — Logging method for experiment tracking. Default is “none”.
- disable_gradient_checkpointing (bool) — Whether to disable gradient checkpointing. Default is False.
- logging_steps (int) — Number of steps between logging events. Default is -1.
- eval_strategy (str) — Strategy for evaluation (e.g., ‘epoch’). Default is “epoch”.
- save_total_limit (int) — Maximum number of checkpoints to keep. Default is 1.
- auto_find_batch_size (bool) — Whether to automatically find the optimal batch size. Default is False.
- mixed_precision (Optional[str]) — Type of mixed precision to use (e.g., ‘fp16’, ‘bf16’, or None). Default is None.
- lr (float) — Learning rate for training. Default is 3e-5.
- epochs (int) — Number of training epochs. Default is 1.
- batch_size (int) — Batch size for training. Default is 2.
- warmup_ratio (float) — Proportion of training to perform learning rate warmup. Default is 0.1.
- gradient_accumulation (int) — Number of steps to accumulate gradients before updating. Default is 4.
- optimizer (str) — Optimizer to use for training. Default is “adamw_torch”.
- scheduler (str) — Learning rate scheduler to use. Default is “linear”.
- weight_decay (float) — Weight decay to apply to the optimizer. Default is 0.0.
- max_grad_norm (float) — Maximum norm for gradient clipping. Default is 1.0.
- seed (int) — Random seed for reproducibility. Default is 42.
- chat_template (Optional[str]) — Template for chat-based models, options include: None, zephyr, chatml, or tokenizer. Default is None.
- quantization (Optional[str]) — Quantization method to use (e.g., ‘int4’, ‘int8’, or None). Default is “int4”.
- target_modules (Optional[str]) — Target modules for quantization or fine-tuning. Default is “all-linear”.
- merge_adapter (bool) — Whether to merge the adapter layers. Default is False.
- peft (bool) — Whether to use Parameter-Efficient Fine-Tuning (PEFT). Default is False.
- lora_r (int) — Rank of the LoRA matrices. Default is 16.
- lora_alpha (int) — Alpha parameter for LoRA. Default is 32.
- lora_dropout (float) — Dropout rate for LoRA. Default is 0.05.
- model_ref (Optional[str]) — Reference model for DPO trainer. Default is None.
- dpo_beta (float) — Beta parameter for DPO trainer. Default is 0.1.
- max_prompt_length (int) — Maximum length of the prompt. Default is 128.
- max_completion_length (Optional[int]) — Maximum length of the completion. Default is None.
- prompt_text_column (Optional[str]) — Column name for the prompt text. Default is None.
- text_column (str) — Column name for the text data. Default is “text”.
- rejected_text_column (Optional[str]) — Column name for the rejected text data. Default is None.
- push_to_hub (bool) — Whether to push the model to the Hugging Face Hub. Default is False.
- username (Optional[str]) — Hugging Face username for authentication. Default is None.
- token (Optional[str]) — Hugging Face token for authentication. Default is None.
- unsloth (bool) — Whether to use the unsloth library. Default is False.
- distributed_backend (Optional[str]) — Backend to use for distributed training. Default is None.
LLMTrainingParams: Parameters for training a language model using the autotrain library.
class autotrain.trainers.sent_transformers.params.SentenceTransformersParams
< source >( data_path: str = None model: str = 'microsoft/mpnet-base' lr: float = 3e-05 epochs: int = 3 max_seq_length: int = 128 batch_size: int = 8 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 train_split: str = 'train' valid_split: typing.Optional[str] = None logging_steps: int = -1 project_name: str = 'project-name' auto_find_batch_size: bool = False mixed_precision: typing.Optional[str] = None save_total_limit: int = 1 token: typing.Optional[str] = None push_to_hub: bool = False eval_strategy: str = 'epoch' username: typing.Optional[str] = None log: str = 'none' early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 trainer: str = 'pair_score' sentence1_column: str = 'sentence1' sentence2_column: str = 'sentence2' sentence3_column: typing.Optional[str] = None target_column: typing.Optional[str] = None )
Parameters
- data_path (str) — Path to the dataset.
- model (str) — Name of the pre-trained model to use. Default is “microsoft/mpnet-base”.
- lr (float) — Learning rate for training. Default is 3e-5.
- epochs (int) — Number of training epochs. Default is 3.
- max_seq_length (int) — Maximum sequence length for the input. Default is 128.
- batch_size (int) — Batch size for training. Default is 8.
- warmup_ratio (float) — Proportion of training to perform learning rate warmup. Default is 0.1.
- gradient_accumulation (int) — Number of steps to accumulate gradients before updating. Default is 1.
- optimizer (str) — Optimizer to use. Default is “adamw_torch”.
- scheduler (str) — Learning rate scheduler to use. Default is “linear”.
- weight_decay (float) — Weight decay to apply. Default is 0.0.
- max_grad_norm (float) — Maximum gradient norm for clipping. Default is 1.0.
- seed (int) — Random seed for reproducibility. Default is 42.
- train_split (str) — Name of the training data split. Default is “train”.
- valid_split (Optional[str]) — Name of the validation data split. Default is None.
- logging_steps (int) — Number of steps between logging. Default is -1.
- project_name (str) — Name of the project for output directory. Default is “project-name”.
- auto_find_batch_size (bool) — Whether to automatically find the optimal batch size. Default is False.
- mixed_precision (Optional[str]) — Mixed precision training mode (fp16, bf16, or None). Default is None.
- save_total_limit (int) — Maximum number of checkpoints to save. Default is 1.
- token (Optional[str]) — Token for accessing Hugging Face Hub. Default is None.
- push_to_hub (bool) — Whether to push the model to Hugging Face Hub. Default is False.
- eval_strategy (str) — Evaluation strategy to use. Default is “epoch”.
- username (Optional[str]) — Hugging Face username. Default is None.
- log (str) — Logging method for experiment tracking. Default is “none”.
- early_stopping_patience (int) — Number of epochs with no improvement after which training will be stopped. Default is 5.
- early_stopping_threshold (float) — Threshold for measuring the new optimum, to qualify as an improvement. Default is 0.01.
- trainer (str) — Name of the trainer to use. Default is “pair_score”.
- sentence1_column (str) — Name of the column containing the first sentence. Default is “sentence1”.
- sentence2_column (str) — Name of the column containing the second sentence. Default is “sentence2”.
- sentence3_column (Optional[str]) — Name of the column containing the third sentence (if applicable). Default is None.
- target_column (Optional[str]) — Name of the column containing the target variable. Default is None.
SentenceTransformersParams is a configuration class for setting up parameters for training sentence transformers.
class autotrain.trainers.seq2seq.params.Seq2SeqParams
< source >( data_path: str = None model: str = 'google/flan-t5-base' username: typing.Optional[str] = None seed: int = 42 train_split: str = 'train' valid_split: typing.Optional[str] = None project_name: str = 'project-name' token: typing.Optional[str] = None push_to_hub: bool = False text_column: str = 'text' target_column: str = 'target' lr: float = 5e-05 epochs: int = 3 max_seq_length: int = 128 max_target_length: int = 128 batch_size: int = 2 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 logging_steps: int = -1 eval_strategy: str = 'epoch' auto_find_batch_size: bool = False mixed_precision: typing.Optional[str] = None save_total_limit: int = 1 peft: bool = False quantization: typing.Optional[str] = 'int8' lora_r: int = 16 lora_alpha: int = 32 lora_dropout: float = 0.05 target_modules: str = 'all-linear' log: str = 'none' early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )
Parameters
- data_path (str) — Path to the dataset.
- model (str) — Name of the model to be used. Default is “google/flan-t5-base”.
- username (Optional[str]) — Hugging Face Username.
- seed (int) — Random seed for reproducibility. Default is 42.
- train_split (str) — Name of the training data split. Default is “train”.
- valid_split (Optional[str]) — Name of the validation data split.
- project_name (str) — Name of the project or output directory. Default is “project-name”.
- token (Optional[str]) — Hub Token for authentication.
- push_to_hub (bool) — Whether to push the model to the Hugging Face Hub. Default is False.
- text_column (str) — Name of the text column in the dataset. Default is “text”.
- target_column (str) — Name of the target text column in the dataset. Default is “target”.
- lr (float) — Learning rate for training. Default is 5e-5.
- epochs (int) — Number of training epochs. Default is 3.
- max_seq_length (int) — Maximum sequence length for input text. Default is 128.
- max_target_length (int) — Maximum sequence length for target text. Default is 128.
- batch_size (int) — Training batch size. Default is 2.
- warmup_ratio (float) — Proportion of warmup steps. Default is 0.1.
- gradient_accumulation (int) — Number of gradient accumulation steps. Default is 1.
- optimizer (str) — Optimizer to be used. Default is “adamw_torch”.
- scheduler (str) — Learning rate scheduler to be used. Default is “linear”.
- weight_decay (float) — Weight decay for the optimizer. Default is 0.0.
- max_grad_norm (float) — Maximum gradient norm for clipping. Default is 1.0.
- logging_steps (int) — Number of steps between logging. Default is -1 (disabled).
- eval_strategy (str) — Evaluation strategy. Default is “epoch”.
- auto_find_batch_size (bool) — Whether to automatically find the batch size. Default is False.
- mixed_precision (Optional[str]) — Mixed precision training mode (fp16, bf16, or None).
- save_total_limit (int) — Maximum number of checkpoints to save. Default is 1.
- peft (bool) — Whether to use Parameter-Efficient Fine-Tuning (PEFT). Default is False.
- quantization (Optional[str]) — Quantization mode (int4, int8, or None). Default is “int8”.
- lora_r (int) — LoRA-R parameter for PEFT. Default is 16.
- lora_alpha (int) — LoRA-Alpha parameter for PEFT. Default is 32.
- lora_dropout (float) — LoRA-Dropout parameter for PEFT. Default is 0.05.
- target_modules (str) — Target modules for PEFT. Default is “all-linear”.
- log (str) — Logging method for experiment tracking. Default is “none”.
- early_stopping_patience (int) — Patience for early stopping. Default is 5.
- early_stopping_threshold (float) — Threshold for early stopping. Default is 0.01.
Seq2SeqParams is a configuration class for sequence-to-sequence training parameters.
class autotrain.trainers.token_classification.params.TokenClassificationParams
< source >( data_path: str = None model: str = 'bert-base-uncased' lr: float = 5e-05 epochs: int = 3 max_seq_length: int = 128 batch_size: int = 8 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 train_split: str = 'train' valid_split: typing.Optional[str] = None tokens_column: str = 'tokens' tags_column: str = 'tags' logging_steps: int = -1 project_name: str = 'project-name' auto_find_batch_size: bool = False mixed_precision: typing.Optional[str] = None save_total_limit: int = 1 token: typing.Optional[str] = None push_to_hub: bool = False eval_strategy: str = 'epoch' username: typing.Optional[str] = None log: str = 'none' early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )
Parameters
- data_path (str) — Path to the dataset.
- model (str) — Name of the model to use. Default is “bert-base-uncased”.
- lr (float) — Learning rate. Default is 5e-5.
- epochs (int) — Number of training epochs. Default is 3.
- max_seq_length (int) — Maximum sequence length. Default is 128.
- batch_size (int) — Training batch size. Default is 8.
- warmup_ratio (float) — Warmup proportion. Default is 0.1.
- gradient_accumulation (int) — Gradient accumulation steps. Default is 1.
- optimizer (str) — Optimizer to use. Default is “adamw_torch”.
- scheduler (str) — Scheduler to use. Default is “linear”.
- weight_decay (float) — Weight decay. Default is 0.0.
- max_grad_norm (float) — Maximum gradient norm. Default is 1.0.
- seed (int) — Random seed. Default is 42.
- train_split (str) — Name of the training split. Default is “train”.
- valid_split (Optional[str]) — Name of the validation split. Default is None.
- tokens_column (str) — Name of the tokens column. Default is “tokens”.
- tags_column (str) — Name of the tags column. Default is “tags”.
- logging_steps (int) — Number of steps between logging. Default is -1.
- project_name (str) — Name of the project. Default is “project-name”.
- auto_find_batch_size (bool) — Whether to automatically find the batch size. Default is False.
- mixed_precision (Optional[str]) — Mixed precision setting (fp16, bf16, or None). Default is None.
- save_total_limit (int) — Total number of checkpoints to save. Default is 1.
- token (Optional[str]) — Hub token for authentication. Default is None.
- push_to_hub (bool) — Whether to push the model to the Hugging Face hub. Default is False.
- eval_strategy (str) — Evaluation strategy. Default is “epoch”.
- username (Optional[str]) — Hugging Face username. Default is None.
- log (str) — Logging method for experiment tracking. Default is “none”.
- early_stopping_patience (int) — Patience for early stopping. Default is 5.
- early_stopping_threshold (float) — Threshold for early stopping. Default is 0.01.
TokenClassificationParams is a configuration class for token classification training parameters.
class autotrain.trainers.extractive_question_answering.params.ExtractiveQuestionAnsweringParams
< source >( data_path: str = None model: str = 'bert-base-uncased' lr: float = 5e-05 epochs: int = 3 max_seq_length: int = 128 max_doc_stride: int = 128 batch_size: int = 8 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 train_split: str = 'train' valid_split: typing.Optional[str] = None text_column: str = 'context' question_column: str = 'question' answer_column: str = 'answers' logging_steps: int = -1 project_name: str = 'project-name' auto_find_batch_size: bool = False mixed_precision: typing.Optional[str] = None save_total_limit: int = 1 token: typing.Optional[str] = None push_to_hub: bool = False eval_strategy: str = 'epoch' username: typing.Optional[str] = None log: str = 'none' early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )
Parameters
- data_path (str) — Path to the dataset.
- model (str) — Pre-trained model name. Default is “bert-base-uncased”.
- lr (float) — Learning rate for the optimizer. Default is 5e-5.
- epochs (int) — Number of training epochs. Default is 3.
- max_seq_length (int) — Maximum sequence length for inputs. Default is 128.
- max_doc_stride (int) — Maximum document stride for splitting context. Default is 128.
- batch_size (int) — Batch size for training. Default is 8.
- warmup_ratio (float) — Warmup proportion for learning rate scheduler. Default is 0.1.
- gradient_accumulation (int) — Number of gradient accumulation steps. Default is 1.
- optimizer (str) — Optimizer type. Default is “adamw_torch”.
- scheduler (str) — Learning rate scheduler type. Default is “linear”.
- weight_decay (float) — Weight decay for the optimizer. Default is 0.0.
- max_grad_norm (float) — Maximum gradient norm for clipping. Default is 1.0.
- seed (int) — Random seed for reproducibility. Default is 42.
- train_split (str) — Name of the training data split. Default is “train”.
- valid_split (Optional[str]) — Name of the validation data split. Default is None.
- text_column (str) — Column name for context/text. Default is “context”.
- question_column (str) — Column name for questions. Default is “question”.
- answer_column (str) — Column name for answers. Default is “answers”.
- logging_steps (int) — Number of steps between logging. Default is -1.
- project_name (str) — Name of the project for output directory. Default is “project-name”.
- auto_find_batch_size (bool) — Automatically find optimal batch size. Default is False.
- mixed_precision (Optional[str]) — Mixed precision training mode (fp16, bf16, or None). Default is None.
- save_total_limit (int) — Maximum number of checkpoints to save. Default is 1.
- token (Optional[str]) — Authentication token for Hugging Face Hub. Default is None.
- push_to_hub (bool) — Whether to push the model to Hugging Face Hub. Default is False.
- eval_strategy (str) — Evaluation strategy during training. Default is “epoch”.
- username (Optional[str]) — Hugging Face username for authentication. Default is None.
- log (str) — Logging method for experiment tracking. Default is “none”.
- early_stopping_patience (int) — Number of epochs with no improvement for early stopping. Default is 5.
- early_stopping_threshold (float) — Threshold for early stopping improvement. Default is 0.01.
ExtractiveQuestionAnsweringParams
class autotrain.trainers.text_classification.params.TextClassificationParams
< source >( data_path: str = None model: str = 'bert-base-uncased' lr: float = 5e-05 epochs: int = 3 max_seq_length: int = 128 batch_size: int = 8 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 train_split: str = 'train' valid_split: typing.Optional[str] = None text_column: str = 'text' target_column: str = 'target' logging_steps: int = -1 project_name: str = 'project-name' auto_find_batch_size: bool = False mixed_precision: typing.Optional[str] = None save_total_limit: int = 1 token: typing.Optional[str] = None push_to_hub: bool = False eval_strategy: str = 'epoch' username: typing.Optional[str] = None log: str = 'none' early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )
Parameters
- data_path (str) — Path to the dataset.
- model (str) — Name of the model to use. Default is “bert-base-uncased”.
- lr (float) — Learning rate. Default is 5e-5.
- epochs (int) — Number of training epochs. Default is 3.
- max_seq_length (int) — Maximum sequence length. Default is 128.
- batch_size (int) — Training batch size. Default is 8.
- warmup_ratio (float) — Warmup proportion. Default is 0.1.
- gradient_accumulation (int) — Number of gradient accumulation steps. Default is 1.
- optimizer (str) — Optimizer to use. Default is “adamw_torch”.
- scheduler (str) — Scheduler to use. Default is “linear”.
- weight_decay (float) — Weight decay. Default is 0.0.
- max_grad_norm (float) — Maximum gradient norm. Default is 1.0.
- seed (int) — Random seed. Default is 42.
- train_split (str) — Name of the training split. Default is “train”.
- valid_split (Optional[str]) — Name of the validation split. Default is None.
- text_column (str) — Name of the text column in the dataset. Default is “text”.
- target_column (str) — Name of the target column in the dataset. Default is “target”.
- logging_steps (int) — Number of steps between logging. Default is -1.
- project_name (str) — Name of the project. Default is “project-name”.
- auto_find_batch_size (bool) — Whether to automatically find the batch size. Default is False.
- mixed_precision (Optional[str]) — Mixed precision setting (fp16, bf16, or None). Default is None.
- save_total_limit (int) — Total number of checkpoints to save. Default is 1.
- token (Optional[str]) — Hub token for authentication. Default is None.
- push_to_hub (bool) — Whether to push the model to the hub. Default is False.
- eval_strategy (str) — Evaluation strategy. Default is “epoch”.
- username (Optional[str]) — Hugging Face username. Default is None.
- log (str) — Logging method for experiment tracking. Default is “none”.
- early_stopping_patience (int) — Number of epochs with no improvement after which training will be stopped. Default is 5.
- early_stopping_threshold (float) — Threshold for measuring the new optimum to continue training. Default is 0.01.
TextClassificationParams
is a configuration class for text classification training parameters.
class autotrain.trainers.text_regression.params.TextRegressionParams
< source >( data_path: str = None model: str = 'bert-base-uncased' lr: float = 5e-05 epochs: int = 3 max_seq_length: int = 128 batch_size: int = 8 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 train_split: str = 'train' valid_split: typing.Optional[str] = None text_column: str = 'text' target_column: str = 'target' logging_steps: int = -1 project_name: str = 'project-name' auto_find_batch_size: bool = False mixed_precision: typing.Optional[str] = None save_total_limit: int = 1 token: typing.Optional[str] = None push_to_hub: bool = False eval_strategy: str = 'epoch' username: typing.Optional[str] = None log: str = 'none' early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )
Parameters
- data_path (str) — Path to the dataset.
- model (str) — Name of the pre-trained model to use. Default is “bert-base-uncased”.
- lr (float) — Learning rate for the optimizer. Default is 5e-5.
- epochs (int) — Number of training epochs. Default is 3.
- max_seq_length (int) — Maximum sequence length for the inputs. Default is 128.
- batch_size (int) — Batch size for training. Default is 8.
- warmup_ratio (float) — Proportion of training to perform learning rate warmup. Default is 0.1.
- gradient_accumulation (int) — Number of steps to accumulate gradients before updating. Default is 1.
- optimizer (str) — Optimizer to use. Default is “adamw_torch”.
- scheduler (str) — Learning rate scheduler to use. Default is “linear”.
- weight_decay (float) — Weight decay to apply. Default is 0.0.
- max_grad_norm (float) — Maximum norm for the gradients. Default is 1.0.
- seed (int) — Random seed for reproducibility. Default is 42.
- train_split (str) — Name of the training data split. Default is “train”.
- valid_split (Optional[str]) — Name of the validation data split. Default is None.
- text_column (str) — Name of the column containing text data. Default is “text”.
- target_column (str) — Name of the column containing target data. Default is “target”.
- logging_steps (int) — Number of steps between logging. Default is -1 (no logging).
- project_name (str) — Name of the project for output directory. Default is “project-name”.
- auto_find_batch_size (bool) — Whether to automatically find the batch size. Default is False.
- mixed_precision (Optional[str]) — Mixed precision training mode (fp16, bf16, or None). Default is None.
- save_total_limit (int) — Maximum number of checkpoints to save. Default is 1.
- token (Optional[str]) — Token for accessing Hugging Face Hub. Default is None.
- push_to_hub (bool) — Whether to push the model to Hugging Face Hub. Default is False.
- eval_strategy (str) — Evaluation strategy to use. Default is “epoch”.
- username (Optional[str]) — Hugging Face username. Default is None.
- log (str) — Logging method for experiment tracking. Default is “none”.
- early_stopping_patience (int) — Number of epochs with no improvement after which training will be stopped. Default is 5.
- early_stopping_threshold (float) — Threshold for measuring the new optimum, to qualify as an improvement. Default is 0.01.
TextRegressionParams is a configuration class for setting up text regression training parameters.
Image Tasks
class autotrain.trainers.image_classification.params.ImageClassificationParams
< source >( data_path: str = None model: str = 'google/vit-base-patch16-224' username: typing.Optional[str] = None lr: float = 5e-05 epochs: int = 3 batch_size: int = 8 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 train_split: str = 'train' valid_split: typing.Optional[str] = None logging_steps: int = -1 project_name: str = 'project-name' auto_find_batch_size: bool = False mixed_precision: typing.Optional[str] = None save_total_limit: int = 1 token: typing.Optional[str] = None push_to_hub: bool = False eval_strategy: str = 'epoch' image_column: str = 'image' target_column: str = 'target' log: str = 'none' early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )
Parameters
- data_path (str) — Path to the dataset.
- model (str) — Pre-trained model name or path. Default is “google/vit-base-patch16-224”.
- username (Optional[str]) — Hugging Face account username.
- lr (float) — Learning rate for the optimizer. Default is 5e-5.
- epochs (int) — Number of epochs for training. Default is 3.
- batch_size (int) — Batch size for training. Default is 8.
- warmup_ratio (float) — Warmup ratio for learning rate scheduler. Default is 0.1.
- gradient_accumulation (int) — Number of gradient accumulation steps. Default is 1.
- optimizer (str) — Optimizer type. Default is “adamw_torch”.
- scheduler (str) — Learning rate scheduler type. Default is “linear”.
- weight_decay (float) — Weight decay for the optimizer. Default is 0.0.
- max_grad_norm (float) — Maximum gradient norm for clipping. Default is 1.0.
- seed (int) — Random seed for reproducibility. Default is 42.
- train_split (str) — Name of the training data split. Default is “train”.
- valid_split (Optional[str]) — Name of the validation data split.
- logging_steps (int) — Number of steps between logging. Default is -1.
- project_name (str) — Name of the project for output directory. Default is “project-name”.
- auto_find_batch_size (bool) — Automatically find optimal batch size. Default is False.
- mixed_precision (Optional[str]) — Mixed precision training mode (fp16, bf16, or None).
- save_total_limit (int) — Maximum number of checkpoints to keep. Default is 1.
- token (Optional[str]) — Hugging Face Hub token for authentication.
- push_to_hub (bool) — Whether to push the model to Hugging Face Hub. Default is False.
- eval_strategy (str) — Evaluation strategy during training. Default is “epoch”.
- image_column (str) — Column name for images in the dataset. Default is “image”.
- target_column (str) — Column name for target labels in the dataset. Default is “target”.
- log (str) — Logging method for experiment tracking. Default is “none”.
- early_stopping_patience (int) — Number of epochs with no improvement for early stopping. Default is 5.
- early_stopping_threshold (float) — Threshold for early stopping. Default is 0.01.
ImageClassificationParams is a configuration class for image classification training parameters.
class autotrain.trainers.image_regression.params.ImageRegressionParams
< source >( data_path: str = None model: str = 'google/vit-base-patch16-224' username: typing.Optional[str] = None lr: float = 5e-05 epochs: int = 3 batch_size: int = 8 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 train_split: str = 'train' valid_split: typing.Optional[str] = None logging_steps: int = -1 project_name: str = 'project-name' auto_find_batch_size: bool = False mixed_precision: typing.Optional[str] = None save_total_limit: int = 1 token: typing.Optional[str] = None push_to_hub: bool = False eval_strategy: str = 'epoch' image_column: str = 'image' target_column: str = 'target' log: str = 'none' early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )
Parameters
- data_path (str) — Path to the dataset.
- model (str) — Name of the model to use. Default is “google/vit-base-patch16-224”.
- username (Optional[str]) — Hugging Face Username.
- lr (float) — Learning rate. Default is 5e-5.
- epochs (int) — Number of training epochs. Default is 3.
- batch_size (int) — Training batch size. Default is 8.
- warmup_ratio (float) — Warmup proportion. Default is 0.1.
- gradient_accumulation (int) — Gradient accumulation steps. Default is 1.
- optimizer (str) — Optimizer to use. Default is “adamw_torch”.
- scheduler (str) — Scheduler to use. Default is “linear”.
- weight_decay (float) — Weight decay. Default is 0.0.
- max_grad_norm (float) — Max gradient norm. Default is 1.0.
- seed (int) — Random seed. Default is 42.
- train_split (str) — Train split name. Default is “train”.
- valid_split (Optional[str]) — Validation split name.
- logging_steps (int) — Logging steps. Default is -1.
- project_name (str) — Output directory name. Default is “project-name”.
- auto_find_batch_size (bool) — Whether to auto find batch size. Default is False.
- mixed_precision (Optional[str]) — Mixed precision type (fp16, bf16, or None).
- save_total_limit (int) — Save total limit. Default is 1.
- token (Optional[str]) — Hub Token.
- push_to_hub (bool) — Whether to push to hub. Default is False.
- eval_strategy (str) — Evaluation strategy. Default is “epoch”.
- image_column (str) — Image column name. Default is “image”.
- target_column (str) — Target column name. Default is “target”.
- log (str) — Logging using experiment tracking. Default is “none”.
- early_stopping_patience (int) — Early stopping patience. Default is 5.
- early_stopping_threshold (float) — Early stopping threshold. Default is 0.01.
ImageRegressionParams is a configuration class for image regression training parameters.
class autotrain.trainers.object_detection.params.ObjectDetectionParams
< source >( data_path: str = None model: str = 'google/vit-base-patch16-224' username: typing.Optional[str] = None lr: float = 5e-05 epochs: int = 3 batch_size: int = 8 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 train_split: str = 'train' valid_split: typing.Optional[str] = None logging_steps: int = -1 project_name: str = 'project-name' auto_find_batch_size: bool = False mixed_precision: typing.Optional[str] = None save_total_limit: int = 1 token: typing.Optional[str] = None push_to_hub: bool = False eval_strategy: str = 'epoch' image_column: str = 'image' objects_column: str = 'objects' log: str = 'none' image_square_size: typing.Optional[int] = 600 early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )
Parameters
- data_path (str) — Path to the dataset.
- model (str) — Name of the model to be used. Default is “google/vit-base-patch16-224”.
- username (Optional[str]) — Hugging Face Username.
- lr (float) — Learning rate. Default is 5e-5.
- epochs (int) — Number of training epochs. Default is 3.
- batch_size (int) — Training batch size. Default is 8.
- warmup_ratio (float) — Warmup proportion. Default is 0.1.
- gradient_accumulation (int) — Gradient accumulation steps. Default is 1.
- optimizer (str) — Optimizer to be used. Default is “adamw_torch”.
- scheduler (str) — Scheduler to be used. Default is “linear”.
- weight_decay (float) — Weight decay. Default is 0.0.
- max_grad_norm (float) — Max gradient norm. Default is 1.0.
- seed (int) — Random seed. Default is 42.
- train_split (str) — Name of the training data split. Default is “train”.
- valid_split (Optional[str]) — Name of the validation data split.
- logging_steps (int) — Number of steps between logging. Default is -1.
- project_name (str) — Name of the project for output directory. Default is “project-name”.
- auto_find_batch_size (bool) — Whether to automatically find batch size. Default is False.
- mixed_precision (Optional[str]) — Mixed precision type (fp16, bf16, or None).
- save_total_limit (int) — Total number of checkpoints to save. Default is 1.
- token (Optional[str]) — Hub Token for authentication.
- push_to_hub (bool) — Whether to push the model to the Hugging Face Hub. Default is False.
- eval_strategy (str) — Evaluation strategy. Default is “epoch”.
- image_column (str) — Name of the image column in the dataset. Default is “image”.
- objects_column (str) — Name of the target column in the dataset. Default is “objects”.
- log (str) — Logging method for experiment tracking. Default is “none”.
- image_square_size (Optional[int]) — Longest size to which the image will be resized, then padded to square. Default is 600.
- early_stopping_patience (int) — Number of epochs with no improvement after which training will be stopped. Default is 5.
- early_stopping_threshold (float) — Minimum change to qualify as an improvement. Default is 0.01.
ObjectDetectionParams is a configuration class for object detection training parameters.
class autotrain.trainers.dreambooth.params.DreamBoothTrainingParams
< source >( model: str = None vae_model: typing.Optional[str] = None revision: typing.Optional[str] = None tokenizer: typing.Optional[str] = None image_path: str = None class_image_path: typing.Optional[str] = None prompt: str = None class_prompt: typing.Optional[str] = None num_class_images: int = 100 class_labels_conditioning: typing.Optional[str] = None prior_preservation: bool = False prior_loss_weight: float = 1.0 project_name: str = 'dreambooth-model' seed: int = 42 resolution: int = 512 center_crop: bool = False train_text_encoder: bool = False batch_size: int = 4 sample_batch_size: int = 4 epochs: int = 1 num_steps: int = None checkpointing_steps: int = 500 resume_from_checkpoint: typing.Optional[str] = None gradient_accumulation: int = 1 disable_gradient_checkpointing: bool = False lr: float = 0.0001 scale_lr: bool = False scheduler: str = 'constant' warmup_steps: int = 0 num_cycles: int = 1 lr_power: float = 1.0 dataloader_num_workers: int = 0 use_8bit_adam: bool = False adam_beta1: float = 0.9 adam_beta2: float = 0.999 adam_weight_decay: float = 0.01 adam_epsilon: float = 1e-08 max_grad_norm: float = 1.0 allow_tf32: bool = False prior_generation_precision: typing.Optional[str] = None local_rank: int = -1 xformers: bool = False pre_compute_text_embeddings: bool = False tokenizer_max_length: typing.Optional[int] = None text_encoder_use_attention_mask: bool = False rank: int = 4 xl: bool = False mixed_precision: typing.Optional[str] = None token: typing.Optional[str] = None push_to_hub: bool = False username: typing.Optional[str] = None validation_prompt: typing.Optional[str] = None num_validation_images: int = 4 validation_epochs: int = 50 checkpoints_total_limit: typing.Optional[int] = None validation_images: typing.Optional[str] = None logging: bool = False )
Parameters
- model (str) — Name of the model to be used for training.
- vae_model (Optional[str]) — Name of the VAE model to be used, if any.
- revision (Optional[str]) — Specific model version to use.
- tokenizer (Optional[str]) — Tokenizer to be used, if different from the model.
- image_path (str) — Path to the training images.
- class_image_path (Optional[str]) — Path to the class images.
- prompt (str) — Prompt for the instance images.
- class_prompt (Optional[str]) — Prompt for the class images.
- num_class_images (int) — Number of class images to generate.
- class_labels_conditioning (Optional[str]) — Conditioning labels for class images.
- prior_preservation (bool) — Enable prior preservation during training.
- prior_loss_weight (float) — Weight of the prior preservation loss.
- project_name (str) — Name of the project for output directory.
- seed (int) — Random seed for reproducibility.
- resolution (int) — Resolution of the training images.
- center_crop (bool) — Enable center cropping of images.
- train_text_encoder (bool) — Enable training of the text encoder.
- batch_size (int) — Batch size for training.
- sample_batch_size (int) — Batch size for sampling.
- epochs (int) — Number of training epochs.
- num_steps (int) — Maximum number of training steps.
- checkpointing_steps (int) — Steps interval for checkpointing.
- resume_from_checkpoint (Optional[str]) — Path to resume training from a checkpoint.
- gradient_accumulation (int) — Number of gradient accumulation steps.
- disable_gradient_checkpointing (bool) — Disable gradient checkpointing.
- lr (float) — Learning rate for training.
- scale_lr (bool) — Enable scaling of the learning rate.
- scheduler (str) — Type of learning rate scheduler.
- warmup_steps (int) — Number of warmup steps for learning rate scheduler.
- num_cycles (int) — Number of cycles for learning rate scheduler.
- lr_power (float) — Power factor for learning rate scheduler.
- dataloader_num_workers (int) — Number of workers for data loading.
- use_8bit_adam (bool) — Enable use of 8-bit Adam optimizer.
- adam_beta1 (float) — Beta1 parameter for Adam optimizer.
- adam_beta2 (float) — Beta2 parameter for Adam optimizer.
- adam_weight_decay (float) — Weight decay for Adam optimizer.
- adam_epsilon (float) — Epsilon parameter for Adam optimizer.
- max_grad_norm (float) — Maximum gradient norm for clipping.
- allow_tf32 (bool) — Allow use of TF32 for training.
- prior_generation_precision (Optional[str]) — Precision for prior generation.
- local_rank (int) — Local rank for distributed training.
- xformers (bool) — Enable xformers memory efficient attention.
- pre_compute_text_embeddings (bool) — Pre-compute text embeddings before training.
- tokenizer_max_length (Optional[int]) — Maximum length for tokenizer.
- text_encoder_use_attention_mask (bool) — Use attention mask for text encoder.
- rank (int) — Rank for distributed training.
- xl (bool) — Enable XL model training.
- mixed_precision (Optional[str]) — Enable mixed precision training.
- token (Optional[str]) — Token for accessing the model hub.
- push_to_hub (bool) — Enable pushing the model to the hub.
- username (Optional[str]) — Username for the model hub.
- validation_prompt (Optional[str]) — Prompt for validation images.
- num_validation_images (int) — Number of validation images to generate.
- validation_epochs (int) — Epoch interval for validation.
- checkpoints_total_limit (Optional[int]) — Total limit for checkpoints.
- validation_images (Optional[str]) — Path to validation images.
- logging (bool) — Enable logging using TensorBoard.
DreamBoothTrainingParams
Tabular Tasks
class autotrain.trainers.tabular.params.TabularParams
< source >( data_path: str = None model: str = 'xgboost' username: typing.Optional[str] = None seed: int = 42 train_split: str = 'train' valid_split: typing.Optional[str] = None project_name: str = 'project-name' token: typing.Optional[str] = None push_to_hub: bool = False id_column: str = 'id' target_columns: typing.Union[typing.List[str], str] = ['target'] categorical_columns: typing.Optional[typing.List[str]] = None numerical_columns: typing.Optional[typing.List[str]] = None task: str = 'classification' num_trials: int = 10 time_limit: int = 600 categorical_imputer: typing.Optional[str] = None numerical_imputer: typing.Optional[str] = None numeric_scaler: typing.Optional[str] = None )
Parameters
- data_path (str) — Path to the dataset.
- model (str) — Name of the model to use. Default is “xgboost”.
- username (Optional[str]) — Hugging Face Username.
- seed (int) — Random seed for reproducibility. Default is 42.
- train_split (str) — Name of the training data split. Default is “train”.
- valid_split (Optional[str]) — Name of the validation data split.
- project_name (str) — Name of the output directory. Default is “project-name”.
- token (Optional[str]) — Hub Token for authentication.
- push_to_hub (bool) — Whether to push the model to the hub. Default is False.
- id_column (str) — Name of the ID column. Default is “id”.
- target_columns (Union[List[str], str]) — Target column(s) in the dataset. Default is [“target”].
- categorical_columns (Optional[List[str]]) — List of categorical columns.
- numerical_columns (Optional[List[str]]) — List of numerical columns.
- task (str) — Type of task (e.g., “classification”). Default is “classification”.
- num_trials (int) — Number of trials for hyperparameter optimization. Default is 10.
- time_limit (int) — Time limit for training in seconds. Default is 600.
- categorical_imputer (Optional[str]) — Imputer strategy for categorical columns.
- numerical_imputer (Optional[str]) — Imputer strategy for numerical columns.
- numeric_scaler (Optional[str]) — Scaler strategy for numerical columns.
TabularParams is a configuration class for tabular data training parameters.