File size: 2,701 Bytes
8293131
 
 
3ad64ff
 
 
 
 
 
 
 
2037cbf
 
8293131
 
 
 
 
 
3ad64ff
8293131
3ad64ff
 
 
8293131
3ad64ff
8293131
 
 
4196be5
 
3ad64ff
8293131
3ad64ff
 
 
 
 
 
8293131
3ad64ff
 
 
 
 
 
4196be5
3ad64ff
8293131
3ad64ff
 
8293131
3ad64ff
 
 
 
8293131
3ad64ff
 
 
 
8293131
3ad64ff
 
8293131
3ad64ff
8293131
3ad64ff
 
 
8293131
3ad64ff
8293131
3ad64ff
8293131
3ad64ff
 
8293131
3ad64ff
8293131
3ad64ff
8293131
3ad64ff
8293131
3ad64ff
 
8293131
 
 
 
3ad64ff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
library_name: peft
base_model: declare-lab/flan-alpaca-base
license: mit
language:
- en
pipeline_tag: text2text-generation
tags:
- sql
- query
- database
datasets:
- wikisql
---

## Model Details

### Model Description

This model is based on the declare-lab/flan-alpaca-base model finetuned with wikisql dataset.

- **Developed by:** Jonathan Jordan
- **Model type:** FLAN Alpaca
- **Language(s) (NLP):** English
- **License:** [More Information Needed]
- **Finetuned from model:** declare-lab/flan-alpaca-base

## Uses

The model generates a string of SQL query based on a question and table columns. **The generated query always uses "table" as the table name**. 
Feel free to change the table name in the generated query to match your actual SQL table. 
The generated SQL query can be run perfectly on the python SQL connection (e.g. psycopg2, mysql_connector, etc).

#### Limitations
1. The question MUST be in english
2. Keep in mind about the difference in data type naming between MySQL and the other SQL databases
3. Simple SQL Aggregation functions (SUM, AVG, COUNT, MIN, MAX) are supported
4. Advanced SQL Aggregation which involves GROUP BY, ORDER BY, HAVING, etc are highly not recommended
5. Table JOIN is not supported

### Input Example
```python
"""Question: what is What was the result of the election in the Florida 18 district?\nTable: table_1341598_10 (result VARCHAR, district VARCHAR)\nSQL: """
```
### Output Example
```python
"""SELECT * FROM table WHERE district = "Florida 18""""
```

### How to use
Load model

```python
from peft import get_peft_config, get_peft_model, TaskType
from peft import PeftConfig, PeftModel
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_id = "jonathanjordan21/flan-alpaca-base-finetuned-lora-wikisql"
config = PeftConfig.from_pretrained(model_id)
model_ = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, return_dict=True)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

model = PeftModel.from_pretrained(model_, model_id)
```

Model inference

```python
question = "server of user id 11 with status active and server id 10"
table = "table_name_77 ( user id INTEGER, status VARCHAR, server id INTEGER )"

test = f"""Question: {question}\nTable: {table}\nSQL: """

p = tokenizer(test, return_tensors='pt')

device = "cuda" if torch.cuda.is_available() else "cpu"
out = model.to(device).generate(**p.to(device),max_new_tokens=50)

print("SQL Query :", tokenizer.batch_decode(out,skip_special_tokens=True)[0])

```

## Performance

### Speed Performance
The model inference takes about 2-3 seconds to run in Google Colab Free Tier CPU

### Framework versions


- PEFT 0.6.2