Spaces:
Runtime error
Runtime error
File size: 12,772 Bytes
223340a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 |
# Configuation
## 1. Introduction
Configuration is divided into fine-grained reusable modules:
- `base`: basic configuration
- `logger`: logger setting
- `model_manager`: loading and saving model parameters
- `accelerator`: whether to enable multi-GPU
- `dataset`: dataset management
- `evaluator`: evaluation and metrics setting.
- `tokenizer`: Tokenizer initiation and tokenizing setting.
- `optimizer`: Optimizer initiation setting.
- `scheduler`: scheduler initiation setting.
- `model`: model construction setting.
From Sec. 2 to Sec. 11, we will describe the configuration in detail. Or you can see [Examples](examples/README.md) for Quick Start.
NOTE: `_*_` config are reserved fields in OpenSLU.
## Configuration Item Script
In OpenSLU configuration, we support simple calculation script for each configuration item. For example, we can get `dataset_name` by using `{dataset.dataset_name}`, and fill its value into python script `'LightChen2333/agif-slu-' + '*'`.(Without '', `{dataset.dataset_name}` value will be treated as a variable).
NOTE: each item with `{}` will be treated as python script.
```yaml
tokenizer:
_from_pretrained_: "'LightChen2333/agif-slu-' + '{dataset.dataset_name}'" # Support simple calculation script
```
## `base` Config
```yaml
# `start_time` will generated automatically when start any config script, needless to be assigned.
# start_time: xxxxxxxx
base:
name: "OpenSLU" # project/logger name
multi_intent: false # whether to enable multi-intent setting
train: True # enable train else enable zero-shot
test: True # enable test during train.
device: cuda # device for cuda/cpu
seed: 42 # random seed
best_key: EMA # save model by which metric[intent_acc/slot_f1/EMA]
tokenizer_name: word_tokenizer # tokenizer: word_tokenizer for no pretrained model, else use [AutoTokenizer] tokenizer name
add_special_tokens: false # whether add [CLS], [SEP] special tokens
epoch_num: 300 # train epoch num
# eval_step: 280 # if eval_by_epoch = false and eval_step > 0, will evaluate model by steps
eval_by_epoch: true # evaluate model by epoch
batch_size: 16 # batch size
```
## `logger` Config
```yaml
logger:
# `wandb` is supported both in single- multi-GPU,
# `tensorboard` is only supported in multi-GPU,
# and `fitlog` is only supported in single-GPU
logger_type: wandb
```
## `model_manager` Config
```yaml
model_manager:
# if load_dir != `null`, OpenSLU will try to load checkpoint to continue training,
# if load_dir == `null`, OpenSLU will restart training.
load_dir: null
# The dir path to save model and training state.
# if save_dir == `null` model will be saved to `save/{start_time}`
save_dir: save/stack
# save_mode can be selected in [save-by-step, save-by-eval]
# `save-by-step` means save model only by {save_step} steps without evaluation.
# `save-by-eval` means save model by best validation performance
save_mode: save-by-eval
# save_step: 100 # only enabled when save_mode == `save-by-step`
max_save_num: 1 # The number of best models will be saved.
```
## `accelerator` Config
```yaml
accelerator:
use_accelerator: false # will enable `accelerator` if use_accelerator is `true`
```
## `dataset` Config
```yaml
dataset:
# support load model from hugging-face.
# dataset_name can be selected in [atis, snips, mix-atis, mix-snips]
dataset_name: atis
# support assign any one of dataset path and other dataset split is the same as split in `dataset_name`
# train: atis # support load model from hugging-face or assigned local data path.
# validation: {root}/ATIS/dev.jsonl
# test: {root}/ATIS/test.jsonl
```
## `evaluator` Config
```yaml
evaluator:
best_key: EMA # the metric to judge the best model
eval_by_epoch: true # Evaluate after an epoch if `true`.
# Evaluate after {eval_step} steps if eval_by_epoch == `false`.
# eval_step: 1800
# metric is supported the metric as below:
# - intent_acc
# - slot_f1
# - EMA
# - intent_f1
# - macro_intent_f1
# - micro_intent_f1
# NOTE: [intent_f1, macro_intent_f1, micro_intent_f1] is only supported in multi-intent setting. intent_f1 and macro_intent_f1 is the same metric.
metric:
- intent_acc
- slot_f1
- EMA
```
## `tokenizer` Config
```yaml
tokenizer:
# Init tokenizer. Support `word_tokenizer` and other tokenizers in huggingface.
_tokenizer_name_: word_tokenizer
# if `_tokenizer_name_` is not assigned, you can load pretrained tokenizer from hugging-face.
# _from_pretrained_: LightChen2333/stack-propagation-slu-atis
_padding_side_: right # the padding side of tokenizer, support [left/ right]
# Align mode between text and slot, support [fast/ general],
# `general` is supported in most tokenizer, `fast` is supported only in small portion of tokenizers.
_align_mode_: fast
_to_lower_case_: true
add_special_tokens: false # other tokenizer args, you can add other args to tokenizer initialization except `_*_` format args
max_length: 512
```
## `optimizer` Config
```yaml
optimizer:
_model_target_: torch.optim.Adam # Optimizer class/ function return Optimizer object
_model_partial_: true # partial load configuration. Here will add model.parameters() to complete all Optimizer parameters
lr: 0.001 # learning rate
weight_decay: 1e-6 # weight decay
```
## `scheduler` Config
```yaml
scheduler:
_model_target_: transformers.get_scheduler
_model_partial_: true # partial load configuration. Here will add optimizer, num_training_steps to complete all Optimizer parameters
name : "linear"
num_warmup_steps: 0
```
## `model` Config
```yaml
model:
# _from_pretrained_: LightChen2333/stack-propagation-slu-atis # load model from hugging-face and is not need to assigned any parameters below.
_model_target_: model.OpenSLUModel # the general model class, can automatically build the model through configuration.
encoder:
_model_target_: model.encoder.AutoEncoder # auto-encoder to autoload provided encoder model
encoder_name: self-attention-lstm # support [lstm/ self-attention-lstm] and other pretrained models those hugging-face supported
embedding: # word embedding layer
# load_embedding_name: glove.6B.300d.txt # support autoload glove embedding.
embedding_dim: 256 # embedding dim
dropout_rate: 0.5 # dropout ratio after embedding
lstm:
layer_num: 1 # lstm configuration
bidirectional: true
output_dim: 256 # module should set output_dim for autoload input_dim in next module. You can also set input_dim manually.
dropout_rate: 0.5
attention: # self-attention configuration
hidden_dim: 1024
output_dim: 128
dropout_rate: 0.5
return_with_input: true # add inputs information, like attention_mask, to decoder module.
return_sentence_level_hidden: false # if return sentence representation to decoder module
decoder:
_model_target_: model.decoder.StackPropagationDecoder # decoder name
interaction:
_model_target_: model.decoder.interaction.StackInteraction # interaction module name
differentiable: false # interaction module config
intent_classifier:
_model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier # intent classifier module name
layer_num: 1
bidirectional: false
hidden_dim: 64
force_ratio: 0.9 # teacher-force ratio
embedding_dim: 8 # intent embedding dim
ignore_index: -100 # ignore index to compute loss and metric
dropout_rate: 0.5
mode: "token-level-intent" # decode mode, support [token-level-intent, intent, slot]
use_multi: "{base.multi_intent}"
return_sentence_level: true # whether to return sentence level prediction as decoded input
slot_classifier:
_model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier
layer_num: 1
bidirectional: false
force_ratio: 0.9
hidden_dim: 64
embedding_dim: 32
ignore_index: -100
dropout_rate: 0.5
mode: "slot"
use_multi: false
return_sentence_level: false
```
## Implementing a New Model
### 1. Interaction Re-Implement
Here we take `DCA-Net` as an example:
In most cases, you just need to rewrite `Interaction` module:
```python
from common.utils import HiddenData
from model.decoder.interaction import BaseInteraction
class DCANetInteraction(BaseInteraction):
def __init__(self, **config):
super().__init__(**config)
self.T_block1 = I_S_Block(self.config["output_dim"], self.config["attention_dropout"], self.config["num_attention_heads"])
...
def forward(self, encode_hidden: HiddenData, **kwargs):
...
```
and then you should configure your module:
```yaml
base:
...
optimizer:
...
scheduler:
...
model:
_model_target_: model.OpenSLUModel
encoder:
_model_target_: model.encoder.AutoEncoder
encoder_name: lstm
embedding:
load_embedding_name: glove.6B.300d.txt
embedding_dim: 300
dropout_rate: 0.5
lstm:
dropout_rate: 0.5
output_dim: 128
layer_num: 2
bidirectional: true
output_dim: "{model.encoder.lstm.output_dim}"
return_with_input: true
return_sentence_level_hidden: false
decoder:
_model_target_: model.decoder.DCANetDecoder
interaction:
_model_target_: model.decoder.interaction.DCANetInteraction
output_dim: "{model.encoder.output_dim}"
attention_dropout: 0.5
num_attention_heads: 8
intent_classifier:
_model_target_: model.decoder.classifier.LinearClassifier
mode: "intent"
input_dim: "{model.decoder.output_dim.output_dim}"
ignore_index: -100
slot_classifier:
_model_target_: model.decoder.classifier.LinearClassifier
mode: "slot"
input_dim: "{model.decoder.output_dim.output_dim}"
ignore_index: -100
```
Oops, you finish all model construction. You can run script as follows to train model:
```shell
python run.py -cp config/dca_net.yaml [-ds atis]
```
### 2. Decoder Re-Implement
Sometimes, `interaction then classification` order can not meet your needs. Therefore, you should simply rewrite decoder for flexible interaction order:
Here, we take `stack-propagation` as an example:
1. We should rewrite interaction module for `stack-propagation`
```python
from common.utils import ClassifierOutputData, HiddenData
from model.decoder.interaction.base_interaction import BaseInteraction
class StackInteraction(BaseInteraction):
def __init__(self, **config):
super().__init__(**config)
...
def forward(self, intent_output: ClassifierOutputData, encode_hidden: HiddenData):
...
```
2. We should rewrite `StackPropagationDecoder` for stack-propagation interaction order:
```python
from common.utils import HiddenData, OutputData
class StackPropagationDecoder(BaseDecoder):
def forward(self, hidden: HiddenData):
pred_intent = self.intent_classifier(hidden)
hidden = self.interaction(pred_intent, hidden)
pred_slot = self.slot_classifier(hidden)
return OutputData(pred_intent, pred_slot)
```
3. Then we can easily combine general model by `config/stack-propagation.yaml` configuration file:
```yaml
base:
...
...
model:
_model_target_: model.OpenSLUModel
encoder:
...
decoder:
_model_target_: model.decoder.StackPropagationDecoder
interaction:
_model_target_: model.decoder.interaction.StackInteraction
differentiable: false
intent_classifier:
_model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier
... # parameters needed __init__(*)
mode: "token-level-intent"
use_multi: false
return_sentence_level: true
slot_classifier:
_model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier
... # parameters needed __init__(*)
mode: "slot"
use_multi: false
return_sentence_level: false
```
4. You can run script as follows to train model:
```shell
python run.py -cp config/stack-propagation.yaml
```
|