bol20162021
commited on
Commit
•
75fdc0d
1
Parent(s):
80a5e39
Update README.md
Browse files
README.md
CHANGED
@@ -25,6 +25,32 @@ We evaluate our models on three text-to-SQL benchmarks: Spider, Bird, and text2s
|
|
25 |
|
26 |
NSQL was trained using cross-entropy loss to maximize the likelihood of sequential inputs. For finetuning on text-to-SQL pairs, we only compute the loss over the SQL portion of the pair. The model is trained using SambaNova's in-house Reconfigurable Dataflow Unit (RDU), leveraging data and model parallelism. We pre-trained for 2 epochs and fine-tuned for 10 epochs.
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
## Intended Use and Limitations
|
29 |
|
30 |
The model was designed for text-to-SQL generation tasks from given table schema and natural language prompts. The model works best with the prompt format defined below and outputting `SELECT` queries.
|
|
|
25 |
|
26 |
NSQL was trained using cross-entropy loss to maximize the likelihood of sequential inputs. For finetuning on text-to-SQL pairs, we only compute the loss over the SQL portion of the pair. The model is trained using SambaNova's in-house Reconfigurable Dataflow Unit (RDU), leveraging data and model parallelism. We pre-trained for 2 epochs and fine-tuned for 10 epochs.
|
27 |
|
28 |
+
### Hyperparameters
|
29 |
+
|
30 |
+
**Continous pretraining on Stack-SQL dataset**
|
31 |
+
|
32 |
+
- Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
|
33 |
+
- Optimizer: AdamW
|
34 |
+
- Epochs: 2
|
35 |
+
- Global Batch size: 256
|
36 |
+
- Batch tokens: 256 * 4096 = 1,048,576 tokens
|
37 |
+
- Learning Rate: 1e-5
|
38 |
+
- Learning Rate Scheduler: Fixed
|
39 |
+
- Warmup Steps: 0
|
40 |
+
- Weight decay: 0.1
|
41 |
+
|
42 |
+
**Finetuning on NSText2SQL dataset**
|
43 |
+
|
44 |
+
- Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
|
45 |
+
- Optimizer: AdamW
|
46 |
+
- Epochs: 10
|
47 |
+
- Global Batch size: 64
|
48 |
+
- Batch tokens: 64 * 4096 = 262,144 tokens
|
49 |
+
- Learning Rate: 1e-5
|
50 |
+
- Learning Rate Scheduler: Cosine Schedule with Warmup
|
51 |
+
- Warmup Steps: 0
|
52 |
+
- End Learning Ratio: 0.1
|
53 |
+
- Weight decay: 0.1
|
54 |
## Intended Use and Limitations
|
55 |
|
56 |
The model was designed for text-to-SQL generation tasks from given table schema and natural language prompts. The model works best with the prompt format defined below and outputting `SELECT` queries.
|