File size: 12,772 Bytes
223340a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
# Configuation

## 1. Introduction

Configuration is divided into fine-grained reusable modules:

- `base`: basic configuration
- `logger`: logger setting
- `model_manager`: loading and saving model parameters
- `accelerator`: whether to enable multi-GPU
- `dataset`: dataset management
- `evaluator`: evaluation and metrics setting.
- `tokenizer`: Tokenizer initiation and tokenizing setting.
- `optimizer`: Optimizer initiation setting.
- `scheduler`: scheduler initiation setting.
- `model`: model construction setting.

From Sec. 2 to Sec. 11, we will describe the configuration in detail. Or you can see [Examples](examples/README.md) for Quick Start.

NOTE: `_*_` config are reserved fields in OpenSLU.

## Configuration Item Script
In OpenSLU configuration, we support simple calculation script for each configuration item. For example, we can get `dataset_name` by using `{dataset.dataset_name}`, and fill its value into python script `'LightChen2333/agif-slu-' + '*'`.(Without '', `{dataset.dataset_name}` value will be treated as a variable).

NOTE: each item with `{}` will be treated as python script. 
```yaml
tokenizer:
  _from_pretrained_: "'LightChen2333/agif-slu-' + '{dataset.dataset_name}'"  # Support simple calculation script

```

## `base` Config
```yaml
# `start_time` will generated automatically when start any config script, needless to be assigned.
# start_time: xxxxxxxx               
base:
  name: "OpenSLU"                  # project/logger name
  multi_intent: false              # whether to enable multi-intent setting
  train: True                      # enable train else enable zero-shot
  test: True                       # enable test during train.
  device: cuda                     # device for cuda/cpu
  seed: 42                         # random seed
  best_key: EMA                    # save model by which metric[intent_acc/slot_f1/EMA]
  tokenizer_name: word_tokenizer   # tokenizer: word_tokenizer for no pretrained model, else use [AutoTokenizer] tokenizer name
  add_special_tokens: false        # whether add [CLS], [SEP] special tokens
  epoch_num: 300                   # train epoch num
#  eval_step: 280                  # if eval_by_epoch = false and eval_step > 0, will evaluate model by steps
  eval_by_epoch: true              # evaluate model by epoch
  batch_size: 16                   # batch size
```
## `logger` Config
```yaml
logger:
  # `wandb` is supported both in single- multi-GPU,
  # `tensorboard` is only supported in multi-GPU,
  # and `fitlog` is only supported in single-GPU
  logger_type: wandb 
```
## `model_manager` Config
```yaml
model_manager:
  # if load_dir != `null`, OpenSLU will try to load checkpoint to continue training,
  # if load_dir == `null`, OpenSLU will restart training.
  load_dir: null
  # The dir path to save model and training state.
  # if save_dir == `null` model will be saved to `save/{start_time}`
  save_dir: save/stack
  # save_mode can be selected in [save-by-step, save-by-eval]
  # `save-by-step` means save model only by {save_step} steps without evaluation.
  # `save-by-eval` means save model by best validation performance
  save_mode: save-by-eval 
  # save_step: 100         # only enabled when save_mode == `save-by-step`
  max_save_num: 1          # The number of best models will be saved.
```
## `accelerator` Config
```yaml
accelerator:
  use_accelerator: false   # will enable `accelerator` if use_accelerator is `true`
```
## `dataset` Config
```yaml
dataset:
  # support load model from hugging-face.
  # dataset_name can be selected in [atis, snips, mix-atis, mix-snips]
  dataset_name: atis
  # support assign any one of dataset path and other dataset split is the same as split in `dataset_name`
  # train: atis # support load model from hugging-face or assigned local data path.
  # validation: {root}/ATIS/dev.jsonl 
  # test: {root}/ATIS/test.jsonl
```
## `evaluator` Config
```yaml
evaluator:
  best_key: EMA        # the metric to judge the best model
  eval_by_epoch: true   # Evaluate after an epoch if `true`.
  # Evaluate after {eval_step} steps if eval_by_epoch == `false`.
  # eval_step: 1800
  # metric is supported the metric as below:
  # - intent_acc
  # - slot_f1
  # - EMA
  # - intent_f1
  # - macro_intent_f1
  # - micro_intent_f1
  # NOTE: [intent_f1, macro_intent_f1, micro_intent_f1] is only supported in multi-intent setting. intent_f1 and macro_intent_f1 is the same metric.
  metric:
    - intent_acc
    - slot_f1
    - EMA
```
## `tokenizer` Config
```yaml
tokenizer:
  # Init tokenizer. Support `word_tokenizer` and other tokenizers in huggingface.
    _tokenizer_name_: word_tokenizer 
    # if `_tokenizer_name_` is not assigned, you can load pretrained tokenizer from hugging-face.
    # _from_pretrained_: LightChen2333/stack-propagation-slu-atis
    _padding_side_: right            # the padding side of tokenizer, support [left/ right]
    # Align mode between text and slot, support [fast/ general],
    # `general` is supported in most tokenizer, `fast` is supported only in small portion of tokenizers.
    _align_mode_: fast
    _to_lower_case_: true
    add_special_tokens: false        # other tokenizer args, you can add other args to tokenizer initialization except `_*_` format args
    max_length: 512

```
## `optimizer` Config
```yaml
optimizer:
  _model_target_: torch.optim.Adam # Optimizer class/ function return Optimizer object
  _model_partial_: true            # partial load configuration. Here will add model.parameters() to complete all Optimizer parameters
  lr: 0.001                        # learning rate
  weight_decay: 1e-6               # weight decay
```
## `scheduler` Config
```yaml
scheduler:
  _model_target_: transformers.get_scheduler
  _model_partial_: true     # partial load configuration. Here will add optimizer, num_training_steps to complete all Optimizer parameters
  name : "linear"
  num_warmup_steps: 0
```
## `model` Config
```yaml
model:
  # _from_pretrained_: LightChen2333/stack-propagation-slu-atis # load model from hugging-face and is not need to assigned any parameters below.
  _model_target_: model.OpenSLUModel # the general model class, can automatically build the model through configuration.

  encoder:
    _model_target_: model.encoder.AutoEncoder # auto-encoder to autoload provided encoder model
    encoder_name: self-attention-lstm         # support [lstm/ self-attention-lstm] and other pretrained models those hugging-face supported

    embedding:                                # word embedding layer
#      load_embedding_name: glove.6B.300d.txt  # support autoload glove embedding.  
      embedding_dim: 256                      # embedding dim
      dropout_rate: 0.5                       # dropout ratio after embedding

    lstm:
      layer_num: 1                           # lstm configuration
      bidirectional: true
      output_dim: 256                        # module should set output_dim for autoload input_dim in next module. You can also set input_dim manually.
      dropout_rate: 0.5

    attention:                              # self-attention configuration
      hidden_dim: 1024
      output_dim: 128
      dropout_rate: 0.5

    return_with_input: true                # add inputs information, like attention_mask, to decoder module.
    return_sentence_level_hidden: false    # if return sentence representation to decoder module

  decoder:
    _model_target_: model.decoder.StackPropagationDecoder  # decoder name
    interaction:
      _model_target_: model.decoder.interaction.StackInteraction # interaction module name
      differentiable: false                                      # interaction module config

    intent_classifier:
      _model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier # intent classifier module name
      layer_num: 1
      bidirectional: false
      hidden_dim: 64
      force_ratio: 0.9                                        # teacher-force ratio
      embedding_dim: 8                                        # intent embedding dim
      ignore_index: -100                                      # ignore index to compute loss and metric
      dropout_rate: 0.5
      mode: "token-level-intent"                              # decode mode, support [token-level-intent, intent, slot]
      use_multi: "{base.multi_intent}"
      return_sentence_level: true                             # whether to return sentence level prediction as decoded input

    slot_classifier:
      _model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier
      layer_num: 1
      bidirectional: false
      force_ratio: 0.9
      hidden_dim: 64
      embedding_dim: 32
      ignore_index: -100
      dropout_rate: 0.5
      mode: "slot"
      use_multi: false
      return_sentence_level: false
```

## Implementing a New Model

### 1. Interaction Re-Implement
Here we take `DCA-Net` as an example:

In most cases, you just need to rewrite `Interaction` module:

```python
from common.utils import HiddenData
from model.decoder.interaction import BaseInteraction
class DCANetInteraction(BaseInteraction):
    def __init__(self, **config):
        super().__init__(**config)
        self.T_block1 = I_S_Block(self.config["output_dim"], self.config["attention_dropout"], self.config["num_attention_heads"])
        ...

    def forward(self, encode_hidden: HiddenData, **kwargs):
        ...
```

and then you should configure your module:
```yaml
base:
  ...

optimizer:
  ...

scheduler:
  ...

model:
  _model_target_: model.OpenSLUModel
  encoder:
    _model_target_: model.encoder.AutoEncoder
    encoder_name: lstm

    embedding:
      load_embedding_name: glove.6B.300d.txt
      embedding_dim: 300
      dropout_rate: 0.5

    lstm:
      dropout_rate: 0.5
      output_dim: 128
      layer_num: 2
      bidirectional: true
    output_dim: "{model.encoder.lstm.output_dim}"
    return_with_input: true
    return_sentence_level_hidden: false

  decoder:
    _model_target_: model.decoder.DCANetDecoder
    interaction:
      _model_target_: model.decoder.interaction.DCANetInteraction
      output_dim: "{model.encoder.output_dim}"
      attention_dropout: 0.5
      num_attention_heads: 8

    intent_classifier:
      _model_target_: model.decoder.classifier.LinearClassifier
      mode: "intent"
      input_dim: "{model.decoder.output_dim.output_dim}"
      ignore_index: -100

    slot_classifier:
      _model_target_: model.decoder.classifier.LinearClassifier
      mode: "slot"
      input_dim: "{model.decoder.output_dim.output_dim}"
      ignore_index: -100
```

Oops, you finish all model construction. You can run script as follows to train model:
```shell
python run.py -cp config/dca_net.yaml [-ds atis]
```
### 2. Decoder Re-Implement
Sometimes, `interaction then classification` order can not meet your needs. Therefore, you should simply rewrite decoder for flexible interaction order:

Here, we take `stack-propagation` as an example:
1. We should rewrite interaction module for `stack-propagation`
```python
from common.utils import ClassifierOutputData, HiddenData
from model.decoder.interaction.base_interaction import BaseInteraction
class StackInteraction(BaseInteraction):
    def __init__(self, **config):
        super().__init__(**config)
        ...

    def forward(self, intent_output: ClassifierOutputData, encode_hidden: HiddenData):
        ...
```
2. We should rewrite `StackPropagationDecoder` for stack-propagation interaction order:
```python
from common.utils import HiddenData, OutputData
class StackPropagationDecoder(BaseDecoder):

    def forward(self, hidden: HiddenData):
        pred_intent = self.intent_classifier(hidden)
        hidden = self.interaction(pred_intent, hidden)
        pred_slot = self.slot_classifier(hidden)
        return OutputData(pred_intent, pred_slot)
```

3. Then we can easily combine general model by `config/stack-propagation.yaml` configuration file:
```yaml
base:
  ...

...

model:
  _model_target_: model.OpenSLUModel

  encoder:
    ...

  decoder:
    _model_target_: model.decoder.StackPropagationDecoder
    interaction:
      _model_target_: model.decoder.interaction.StackInteraction
      differentiable: false

    intent_classifier:
      _model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier
      ... # parameters needed __init__(*)
      mode: "token-level-intent"
      use_multi: false
      return_sentence_level: true

    slot_classifier:
      _model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier
      ... # parameters needed __init__(*)
      mode: "slot"
      use_multi: false
      return_sentence_level: false
```
4. You can run script as follows to train model:
```shell
python run.py -cp config/stack-propagation.yaml
```