Getting Incomplete Query in most cases
Hi NumberStation team,
I have deployed this model in sagemaker in ml.g5.2xlarge compute. In most of the cases i'm getting incomplete query response. I have tried increasing response token size using "max_tokens": 999999 but the issue still persists.
I have also tried the nsql-350M version and that's working fine with complete query being generated. Seeing this issue only with nsql-2B model
This is the Input Query :
CREATE TABLE git_worklog_gpt (
org_id text,
type text,
repository text,
value integer,
date date,
author_email text
)
CREATE TABLE burnout_worklog_gpt (
org_id text,
person text,
date date
)
CREATE TABLE work_cycletime_gpt (
org_id text,
type text,
project text,
time_in_minutes integer,
date date
)
CREATE TABLE issue_worklog_gpt (
org_id text,
type text,
project text,
value integer,
date date,
status text
)
CREATE TABLE investment_report_gpt (
org_id text,
investment_name text,
value integer,
date date,
story_point integer,
estimate integer
)
CREATE TABLE issue_lead_time_gpt (
org_id text,
project text,
type text,
status text,
priority text,
lead_time_in_minutes double precision,
date date
)
-- Using valid SQLite, answer the following questions for the tables provided above.
-- list all issues closed in last 30 days
SELECT
Tis is the Response which I'm getting :
CREATE TABLE git_worklog_gpt (
org_id text,
type text,
repository text,
value integer,
date date,
author_email text
)
CREATE TABLE burnout_worklog_gpt (
org_id text,
person text,
date date
)
CREATE TABLE work_cycletime_gpt (
org_id text,
type text,
project text,
time_in_minutes integer,
date date
)
CREATE TABLE issue_worklog_gpt (
org_id text,
type text,
project text,
value integer,
date date,
status text
)
CREATE TABLE investment_report_gpt (
org_id text,
investment_name text,
value integer,
date date,
story_point integer,
estimate integer
)
CREATE TABLE issue_lead_time_gpt (
org_id text,
project text,
type text,
status text,
priority text,
lead_time_in_minutes double precision,
date date
)
-- Using valid SQLite, answer the following questions for the tables provided above.
-- list all issues closed in last 30 days
SELECT * FROM issue_worklog_gpt WHERE status = "Closed" AND DATE(
Will really appreciate some help getting around this.
Thanks
Hi @smooth-operator94 ,
I am not very familiar with sagemaker setup. Can you provide some more details about your settings and the config you sent to the sagemaker?
FYI: Here is the response I got from a local deployment:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("NumbersStation/nsql-2B")
model = AutoModelForCausalLM.from_pretrained("NumbersStation/nsql-2B", torch_dtype=torch.float16).to(0)
text = """CREATE TABLE git_worklog_gpt (
org_id text,
type text,
repository text,
value integer,
date date,
author_email text
)
CREATE TABLE burnout_worklog_gpt (
org_id text,
person text,
date date
)
CREATE TABLE work_cycletime_gpt (
org_id text,
type text,
project text,
time_in_minutes integer,
date date
)
CREATE TABLE issue_worklog_gpt (
org_id text,
type text,
project text,
value integer,
date date,
status text
)
CREATE TABLE investment_report_gpt (
org_id text,
investment_name text,
value integer,
date date,
story_point integer,
estimate integer
)
CREATE TABLE issue_lead_time_gpt (
org_id text,
project text,
type text,
status text,
priority text,
lead_time_in_minutes double precision,
date date
)
-- Using valid SQLite, answer the following questions for the tables provided above.
-- list all issues closed in last 30 days
SELECT"""
input_ids = tokenizer(text, return_tensors="pt").input_ids.to(0)
generated_ids = model.generate(input_ids, max_length=500)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
And output is
SELECT * FROM issue_worklog_gpt WHERE status = "Closed" AND date >= DATEADD(DAY, -30, GETDATE());
Hi senwu,
I'm using following script for sagemaker deployment
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='AmazonSageMaker-ExecutionRole-20230723T133694')['Role']['Arn']
Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'NumbersStation/nsql-2B',
'SM_NUM_GPUS': json.dumps(1)
}
create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="0.9.3"),
env=hub,
role=role,
)
deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300
)
After going through the aws Cloudwatch logs, i found these configs being used during model deployment
Args { model_id: "NumbersStation/nsql-2B", revision: None, validation_workers: 2, sharded: None, num_shard: Some(1), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, hostname: "container-0.local", port: 8080, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/tmp"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: false }