No lm_head weight?
I might be missing something, but there isn't a lm_head weight in the model_index.json?
Is this intentional?
This is due to the fact that safe_serialization (safetensors) were used which don't allow aliasing and the lm_head
weights are tied to transformer.word_embeddings.weight
. Had you any problems loading the model? I checked with transformers and TGI and both could load the model properly.
export model=OpenAssistant/falcon-40b-megacode2-oasst
export num_shard=8
export volume=/home/ubuntu/prannay-efs/checkpoints/tgi-FalconOA-megacode2 # share a volume with the Docker container to avoid downloading weights every run
export max_input_length=2048
export max_total_tokens=2049
export docker_image_id=ghcr.io/huggingface/text-generation-inference:0.9.4 # ghcr.io/huggingface/text-generation-inference:0.9.2
export falconlite_tgi_port=448
docker run -d --gpus all \
--shm-size 1g -p $falconlite_tgi_port:80 -v /home:/home $docker_image_id \
--model-id $model --num-shard ${num_shard} \
--trust-remote-code --dtype bfloat16
This is the script I am using to launch the model
[2m2023-08-18T19:54:41.846461Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Args { model_id: "/home/ubuntu/prannay-efs/checkpoints/tgi-FalconOA-megacode2", revision: None, validation_workers: 2, sharded: None, num_shard: Some(8), quantize: None, dtype: Some(BFloat16), trust_remote_code: true, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "97f4f178d973", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
[2m2023-08-18T19:54:41.846495Z[0m [33m WARN[0m [2mtext_generation_launcher[0m[2m:[0m `trust_remote_code` is set. Trusting that model `/home/ubuntu/prannay-efs/checkpoints/tgi-FalconOA-megacode2` do not contain malicious code.
[2m2023-08-18T19:54:41.846499Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Sharding model on 8 processes
[2m2023-08-18T19:54:41.846634Z[0m [32m INFO[0m [1mdownload[0m: [2mtext_generation_launcher[0m[2m:[0m Starting download process.
[2m2023-08-18T19:54:43.393983Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Files are already present on the host. Skipping download.
[2m2023-08-18T19:54:43.748718Z[0m [32m INFO[0m [1mdownload[0m: [2mtext_generation_launcher[0m[2m:[0m Successfully downloaded weights.
[2m2023-08-18T19:54:43.749128Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Starting shard [2m[3mrank[0m[2m=[0m0[0m
[2m2023-08-18T19:54:43.749863Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Starting shard [2m[3mrank[0m[2m=[0m1[0m
[2m2023-08-18T19:54:43.750865Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Starting shard [2m[3mrank[0m[2m=[0m2[0m
[2m2023-08-18T19:54:43.750865Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Starting shard [2m[3mrank[0m[2m=[0m4[0m
[2m2023-08-18T19:54:43.750865Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Starting shard [2m[3mrank[0m[2m=[0m3[0m
[2m2023-08-18T19:54:43.752334Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Starting shard [2m[3mrank[0m[2m=[0m5[0m
[2m2023-08-18T19:54:43.753078Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Starting shard [2m[3mrank[0m[2m=[0m6[0m
[2m2023-08-18T19:54:43.753054Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Starting shard [2m[3mrank[0m[2m=[0m7[0m
[2m2023-08-18T19:54:53.762085Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m4[0m
[2m2023-08-18T19:54:53.763929Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m0[0m
[2m2023-08-18T19:54:53.765914Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m1[0m
[2m2023-08-18T19:54:53.767026Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m7[0m
[2m2023-08-18T19:54:53.774333Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m5[0m
[2m2023-08-18T19:54:53.780372Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m6[0m
[2m2023-08-18T19:54:53.786063Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m3[0m
[2m2023-08-18T19:54:53.786520Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m2[0m
[2m2023-08-18T19:55:00.297334Z[0m [31mERROR[0m [2mtext_generation_launcher[0m[2m:[0m Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
server.serve(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
asyncio.run(
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
return FlashRWSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
model = FlashRWForCausalLM(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
self.lm_head = TensorParallelHead.load(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
weight = weights.get_sharded(f"{prefix}.weight", dim=0)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
[2m2023-08-18T19:55:00.399462Z[0m [31mERROR[0m [2mtext_generation_launcher[0m[2m:[0m Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
server.serve(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
asyncio.run(
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
return FlashRWSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
model = FlashRWForCausalLM(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
self.lm_head = TensorParallelHead.load(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
weight = weights.get_sharded(f"{prefix}.weight", dim=0)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
[2m2023-08-18T19:55:00.505569Z[0m [31mERROR[0m [2mtext_generation_launcher[0m[2m:[0m Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
server.serve(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
asyncio.run(
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
return FlashRWSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
model = FlashRWForCausalLM(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
self.lm_head = TensorParallelHead.load(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
weight = weights.get_sharded(f"{prefix}.weight", dim=0)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
[2m2023-08-18T19:55:00.508876Z[0m [31mERROR[0m [2mtext_generation_launcher[0m[2m:[0m Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
server.serve(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
asyncio.run(
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
return FlashRWSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
model = FlashRWForCausalLM(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
self.lm_head = TensorParallelHead.load(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
weight = weights.get_sharded(f"{prefix}.weight", dim=0)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
[2m2023-08-18T19:55:00.510836Z[0m [31mERROR[0m [2mtext_generation_launcher[0m[2m:[0m Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
server.serve(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
asyncio.run(
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
return FlashRWSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
model = FlashRWForCausalLM(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
self.lm_head = TensorParallelHead.load(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
weight = weights.get_sharded(f"{prefix}.weight", dim=0)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
[2m2023-08-18T19:55:00.518517Z[0m [31mERROR[0m [2mtext_generation_launcher[0m[2m:[0m Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
server.serve(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
asyncio.run(
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
return FlashRWSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
model = FlashRWForCausalLM(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
self.lm_head = TensorParallelHead.load(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
weight = weights.get_sharded(f"{prefix}.weight", dim=0)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
[2m2023-08-18T19:55:00.560307Z[0m [31mERROR[0m [2mtext_generation_launcher[0m[2m:[0m Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
server.serve(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
asyncio.run(
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
return FlashRWSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
model = FlashRWForCausalLM(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
self.lm_head = TensorParallelHead.load(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
weight = weights.get_sharded(f"{prefix}.weight", dim=0)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
[2m2023-08-18T19:55:00.685230Z[0m [31mERROR[0m [2mtext_generation_launcher[0m[2m:[0m Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
server.serve(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
asyncio.run(
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
return FlashRWSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
model = FlashRWForCausalLM(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
self.lm_head = TensorParallelHead.load(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
weight = weights.get_sharded(f"{prefix}.weight", dim=0)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
[2m2023-08-18T19:55:03.790575Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m0[0m
[2m2023-08-18T19:55:03.790920Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m5[0m
[2m2023-08-18T19:55:03.796426Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m4[0m
[2m2023-08-18T19:55:03.799873Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m1[0m
[2m2023-08-18T19:55:03.802729Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m7[0m
[2m2023-08-18T19:55:03.804963Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m6[0m
[2m2023-08-18T19:55:03.807944Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m3[0m
[2m2023-08-18T19:55:03.834266Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to be ready... [2m[3mrank[0m[2m=[0m2[0m
[2m2023-08-18T19:55:13.209919Z[0m [31mERROR[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Shard complete standard error output:
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
[W socket.cpp:601] [c10d] The client socket has failed to connect to [localhost]:29500 (errno: 99 - Cannot assign requested address).
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type falcon to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
server.serve(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
asyncio.run(
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
return FlashRWSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
model = FlashRWForCausalLM(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
self.lm_head = TensorParallelHead.load(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
weight = weights.get_sharded(f"{prefix}.weight", dim=0)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
[2m[3mrank[0m[2m=[0m7[0m
[2m2023-08-18T19:55:13.291609Z[0m [31mERROR[0m [2mtext_generation_launcher[0m[2m:[0m Shard 7 failed to start
[2m2023-08-18T19:55:13.291631Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Shutting down shards
[2m2023-08-18T19:55:13.342262Z[0m [31mERROR[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Shard complete standard error output:
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type falcon to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
server.serve(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
asyncio.run(
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
return FlashRWSharded(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
model = FlashRWForCausalLM(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
self.lm_head = TensorParallelHead.load(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
weight = weights.get_sharded(f"{prefix}.weight", dim=0)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
[2m[3mrank[0m[2m=[0m2[0m
[2m2023-08-18T19:55:13.381545Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Shard terminated [2m[3mrank[0m[2m=[0m1[0m
[2m2023-08-18T19:55:13.498104Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Shard terminated [2m[3mrank[0m[2m=[0m0[0m
[2m2023-08-18T19:55:13.615350Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Shard terminated [2m[3mrank[0m[2m=[0m4[0m
[2m2023-08-18T19:55:13.714285Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Shard terminated [2m[3mrank[0m[2m=[0m3[0m
[2m2023-08-18T19:55:13.783450Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Shard terminated [2m[3mrank[0m[2m=[0m5[0m
[2m2023-08-18T19:55:13.852918Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Shard terminated [2m[3mrank[0m[2m=[0m6[0m
here we can see a runtime error where the launcher cannot find the "lm_head" weight.
There should be an alias defined for lm_head.weight in flash_rw.py, see https://github.com/huggingface/text-generation-inference/blob/c4422e567811051e40469551c58a8078158a6656/server/text_generation_server/models/flash_rw.py#L57 .. it was added two weeks ago in TGI, probably not part of the officially released docker container...
I saved some notes how to build TGI manually in a gist.
Thanks for your help, I found the edit and made a new docker image. Thank you for the TGI build gist, will take a look.