Spaces:

zetavg
/

LLaMA-LoRA-Tuner-UI-Demo

Runtime error

App Files Files Community

zetavg commited on Apr 24, 2023

Commit

0e73bed

•

1 Parent(s): 45df9da

update instructions for SkyPilot

Browse files

Files changed (1) hide show

README.md +44 -14

README.md CHANGED Viewed

@@ -42,10 +42,10 @@ After approximately 5 minutes of running, you will see the public URL in the out
 After following the [installation guide of SkyPilot](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html), create a `.yaml` to define a task for running the app:
 ```yaml
-# llama-lora-tuner.yaml
 resources:
-  accelerators: A10:1  # 1x NVIDIA A10 GPU, about US$ 0.6 / hr on Lambda Cloud.
   cloud: lambda  # Optional; if left out, SkyPilot will automatically pick the cheapest cloud.
 file_mounts:
@@ -53,30 +53,46 @@ file_mounts:
   # (to store train datasets trained models)
   # See https://skypilot.readthedocs.io/en/latest/reference/storage.html for details.
   /data:
-    name: llama-lora-tuner-data  # Make sure this name is unique or you own this bucket. If it does not exists, SkyPilot will try to create a bucket with this name.
     store: s3  # Could be either of [s3, gcs]
     mode: MOUNT
 # Clone the LLaMA-LoRA Tuner repo and install its dependencies.
 setup: |
-  git clone https://github.com/zetavg/LLaMA-LoRA-Tuner.git llama_lora_tuner
-  cd llama_lora_tuner && pip install -r requirements.lock.txt
   pip install wandb
-  cd ..
   echo 'Dependencies installed.'
-  echo 'Pre-downloading base models so that you won't have to wait for long once the app is ready...'
-  python llama_lora_tuner/download_base_model.py --base_model_names='decapoda-research/llama-7b-hf,nomic-ai/gpt4all-j,databricks/dolly-v2-7b'
-# Start the app.
 run: |
-  echo 'Starting...'
-  python llama_lora_tuner/app.py --data_dir='/data' --wandb_api_key="$([ -f /data/secrets/wandb_api_key ] && cat /data/secrets/wandb_api_key | tr -d '\n')" --timezone='Atlantic/Reykjavik' --base_model=decapoda-research/llama-7b-hf --base_model_choices='decapoda-research/llama-7b-hf,nomic-ai/gpt4all-j,databricks/dolly-v2-7b --share
 ```
 Then launch a cluster to run the task:
 ```
-sky launch -c llama-lora-tuner llama-lora-tuner.yaml
 ```
 `-c ...` is an optional flag to specify a cluster name. If not specified, SkyPilot will automatically generate one.
@@ -87,14 +103,28 @@ Note that exiting `sky launch` will only exit log streaming and will not stop th
 When you are done, run `sky stop <cluster_name>` to stop the cluster. To terminate a cluster instead, run `sky down <cluster_name>`.
 ### Run locally
 <details>
   <summary>Prepare environment with conda</summary>
   ```bash
-  conda create -y python=3.8 -n llama-lora-tuner
-  conda activate llama-lora-tuner
   ```
 </details>

 After following the [installation guide of SkyPilot](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html), create a `.yaml` to define a task for running the app:
 ```yaml
+# llm-tuner.yaml
 resources:
+  accelerators: A10:1  # 1x NVIDIA A10 GPU, about US$ 0.6 / hr on Lambda Cloud. Run `sky show-gpus` for supported GPU types, and `sky show-gpus [GPU_NAME]` for the detailed information of a GPU type.
   cloud: lambda  # Optional; if left out, SkyPilot will automatically pick the cheapest cloud.
 file_mounts:
   # (to store train datasets trained models)
   # See https://skypilot.readthedocs.io/en/latest/reference/storage.html for details.
   /data:
+    name: llm-tuner-data  # Make sure this name is unique or you own this bucket. If it does not exists, SkyPilot will try to create a bucket with this name.
     store: s3  # Could be either of [s3, gcs]
     mode: MOUNT
 # Clone the LLaMA-LoRA Tuner repo and install its dependencies.
 setup: |
+  conda create -q python=3.8 -n llm-tuner -y
+  conda activate llm-tuner
+  # Clone the LLaMA-LoRA Tuner repo and install its dependencies
+  [ ! -d llm_tuner ] && git clone https://github.com/zetavg/LLaMA-LoRA-Tuner.git llm_tuner
+  echo 'Installing dependencies...'
+  pip install -r llm_tuner/requirements.lock.txt
+  # Optional: install wandb to enable logging to Weights & Biases
   pip install wandb
+  # Optional: patch bitsandbytes to workaround error "libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats"
+  BITSANDBYTES_LOCATION="$(pip show bitsandbytes | grep 'Location' | awk '{print $2}')/bitsandbytes"
+  [ -f "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so" ] && [ ! -f "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so.bak" ] && [ -f "$BITSANDBYTES_LOCATION/libbitsandbytes_cuda121.so" ] && echo 'Patching bitsandbytes for GPU support...' && mv "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so" "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so.bak" && cp "$BITSANDBYTES_LOCATION/libbitsandbytes_cuda121.so" "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so"
+  conda install -n llm-tuner cudatoolkit -y
   echo 'Dependencies installed.'
+  # Optional: pre-download models
+  echo "Pre-downloading base models so that you won't have to wait for long once the app is ready..."
+  python llm_tuner/download_base_model.py --base_model_names='decapoda-research/llama-7b-hf,nomic-ai/gpt4all-j'
+# Start the app. `wandb_api_key` and `wandb_project_name` are optional.
 run: |
+  conda activate llm-tuner
+  python llm_tuner/app.py \
+    --data_dir='/data' \
+    --wandb_api_key="$([ -f /data/secrets/wandb_api_key.txt ] && cat /data/secrets/wandb_api_key.txt | tr -d '\n')" \
+    --wandb_project_name='llm-tuner' \
+    --timezone='Atlantic/Reykjavik' \
+    --base_model='decapoda-research/llama-7b-hf' \
+    --base_model_choices='decapoda-research/llama-7b-hf,nomic-ai/gpt4all-j,databricks/dolly-v2-7b' \
+    --share
 ```
 Then launch a cluster to run the task:
 ```
+sky launch -c llm-tuner llm-tuner.yaml
 ```
 `-c ...` is an optional flag to specify a cluster name. If not specified, SkyPilot will automatically generate one.
 When you are done, run `sky stop <cluster_name>` to stop the cluster. To terminate a cluster instead, run `sky down <cluster_name>`.
+**Remember to stop or shutdown the cluster when you are done to avoid incurring unexpected charges.** Run `sky cost-report` to see the cost of your clusters.
+<details>
+  <summary>Log into the cloud machine or mount the filesystem of the cloud machine on your local computer</summary>
+  To log into the cloud machine, run `ssh <cluster_name>`, such as `ssh llm-tuner`.
+  If you have `sshfs` installed on your local machine, you can mount the filesystem of the cloud machine on your local computer by running a command like the following:
+  ```bash
+  mkdir -p /tmp/llm_tuner_server && umount /tmp/llm_tuner_server || : && sshfs llm-tuner:/ /tmp/llm_tuner_server
+  ```
+</details>
 ### Run locally
 <details>
   <summary>Prepare environment with conda</summary>
   ```bash
+  conda create -y python=3.8 -n llm-tuner
+  conda activate llm-tuner
   ```
 </details>