Can't find zephyr-7b-beta cache using optimum cli list command.
I am a beginner, facing issues with finding and loading the cache files that i need for zephyr-7b-beta. I am using the commands given on the guides, but getting issues like repo not found. Can someone please help with that. As in give the exact commands to find and load the models I mentioned.
Hi, please make sure you have the latest version of optimum-neuron installed:
$ pip install -U optimum-neuron
Then type:
$ optimum-cli neuron cache lookup HuggingFaceH4/zephyr-7b-beta
*** 0 entrie(s) found in cache for HuggingFaceH4/zephyr-7b-beta for training.***
*** 12 entrie(s) found in cache for HuggingFaceH4/zephyr-7b-beta for inference.***
...
Hey, thank you for the response. I get this when I try that: optimum-cli neuron cache lookup HuggingFaceH4/zephyr-7b-beta
usage: optimum-cli neuron cache [-h] {create,set,add,list,synchronize} ...
optimum-cli neuron cache: error: argument {create,set,add,list,synchronize}: invalid choice: 'lookup' (choose from 'create', 'set', 'add', 'list', 'synchronize')
does lookup not work on Inf2?
You don't seem to have the latest version of optimum-neuron
(0.0.20).
$ pip show optimum-neuron
Name: optimum-neuron
Version: 0.0.20
...
$ optimum-cli neuron cache -h
usage: optimum-cli neuron cache [-h] {create,set,add,synchronize,lookup} ...
positional arguments:
{create,set,add,synchronize,lookup}
create Create a model repo on the Hugging Face Hub to store Neuron X compilation files.
set Set the name of the Neuron cache repo to use locally (trainium only).
add Add a model to the cache of your choice (trainium only).
synchronize Synchronize the neuronx compiler cache with a hub cache repo.
lookup Lookup the neuronx compiler hub cache for the specified model id.
options:
-h, --help show this help message and exit
Thank you, updating optimum worked.
Is there also a way to download or load the neff files to my local environment so that I don't have to export a model? Sorry if it is a stupid question, this is not really my domain..
If you export the model for one of the cached configuration (batch_size, sequence_length, auto_cast_type, num_cores), then the cached NEFFS will be fetched automatically (you'll see messages on the console).