bol20162021 commited on
Commit
75fdc0d
1 Parent(s): 80a5e39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -25,6 +25,32 @@ We evaluate our models on three text-to-SQL benchmarks: Spider, Bird, and text2s
25
 
26
  NSQL was trained using cross-entropy loss to maximize the likelihood of sequential inputs. For finetuning on text-to-SQL pairs, we only compute the loss over the SQL portion of the pair. The model is trained using SambaNova's in-house Reconfigurable Dataflow Unit (RDU), leveraging data and model parallelism. We pre-trained for 2 epochs and fine-tuned for 10 epochs.
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ## Intended Use and Limitations
29
 
30
  The model was designed for text-to-SQL generation tasks from given table schema and natural language prompts. The model works best with the prompt format defined below and outputting `SELECT` queries.
 
25
 
26
  NSQL was trained using cross-entropy loss to maximize the likelihood of sequential inputs. For finetuning on text-to-SQL pairs, we only compute the loss over the SQL portion of the pair. The model is trained using SambaNova's in-house Reconfigurable Dataflow Unit (RDU), leveraging data and model parallelism. We pre-trained for 2 epochs and fine-tuned for 10 epochs.
27
 
28
+ ### Hyperparameters
29
+
30
+ **Continous pretraining on Stack-SQL dataset**
31
+
32
+ - Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
33
+ - Optimizer: AdamW
34
+ - Epochs: 2
35
+ - Global Batch size: 256
36
+ - Batch tokens: 256 * 4096 = 1,048,576 tokens
37
+ - Learning Rate: 1e-5
38
+ - Learning Rate Scheduler: Fixed
39
+ - Warmup Steps: 0
40
+ - Weight decay: 0.1
41
+
42
+ **Finetuning on NSText2SQL dataset**
43
+
44
+ - Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
45
+ - Optimizer: AdamW
46
+ - Epochs: 10
47
+ - Global Batch size: 64
48
+ - Batch tokens: 64 * 4096 = 262,144 tokens
49
+ - Learning Rate: 1e-5
50
+ - Learning Rate Scheduler: Cosine Schedule with Warmup
51
+ - Warmup Steps: 0
52
+ - End Learning Ratio: 0.1
53
+ - Weight decay: 0.1
54
  ## Intended Use and Limitations
55
 
56
  The model was designed for text-to-SQL generation tasks from given table schema and natural language prompts. The model works best with the prompt format defined below and outputting `SELECT` queries.