sambanovasystems
/

SambaCoder-nsql-llama-2-70b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bol20162021 commited on Feb 4

Commit

75fdc0d

•

1 Parent(s): 80a5e39

Update README.md

Files changed (1) hide show

README.md +26 -0

README.md CHANGED Viewed

@@ -25,6 +25,32 @@ We evaluate our models on three text-to-SQL benchmarks: Spider, Bird, and text2s
 NSQL was trained using cross-entropy loss to maximize the likelihood of sequential inputs. For finetuning on text-to-SQL pairs, we only compute the loss over the SQL portion of the pair. The model is trained using SambaNova's in-house Reconfigurable Dataflow Unit (RDU), leveraging data and model parallelism. We pre-trained for 2 epochs and fine-tuned for 10 epochs.
 ## Intended Use and Limitations
 The model was designed for text-to-SQL generation tasks from given table schema and natural language prompts. The model works best with the prompt format defined below and outputting `SELECT` queries.

 NSQL was trained using cross-entropy loss to maximize the likelihood of sequential inputs. For finetuning on text-to-SQL pairs, we only compute the loss over the SQL portion of the pair. The model is trained using SambaNova's in-house Reconfigurable Dataflow Unit (RDU), leveraging data and model parallelism. We pre-trained for 2 epochs and fine-tuned for 10 epochs.
+### Hyperparameters
+**Continous pretraining on Stack-SQL dataset**
+- Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
+- Optimizer: AdamW
+- Epochs: 2
+- Global Batch size: 256
+- Batch tokens: 256 * 4096 = 1,048,576 tokens
+- Learning Rate: 1e-5
+- Learning Rate Scheduler: Fixed
+- Warmup Steps: 0
+- Weight decay: 0.1
+**Finetuning on NSText2SQL dataset**
+- Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
+- Optimizer: AdamW
+- Epochs: 10
+- Global Batch size: 64
+- Batch tokens: 64 * 4096 = 262,144 tokens
+- Learning Rate: 1e-5
+- Learning Rate Scheduler: Cosine Schedule with Warmup
+- Warmup Steps: 0
+- End Learning Ratio: 0.1
+- Weight decay: 0.1
 ## Intended Use and Limitations
 The model was designed for text-to-SQL generation tasks from given table schema and natural language prompts. The model works best with the prompt format defined below and outputting `SELECT` queries.