GGUF
English
Mixture of Experts
olmo
olmoe
Inference Endpoints
conversational

error loading model

#1
by LaferriereJC - opened

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'olmoe'

If you're using AI inference app that uses llama.cpp backend, you need to wait until the app is updated with recent release of llama.cpp that has this merge that enables olmoe support: https://github.com/ggerganov/llama.cpp/pull/9462 or you could download latest release of llama.cpp and run this olmoe model directly from command line.

llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = olmoe
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = OLMoE 1B 7B 0924 Instruct
llama_model_loader: - kv 3: general.version str = 0924
llama_model_loader: - kv 4: general.finetune str = Instruct
llama_model_loader: - kv 5: general.basename str = OLMoE
llama_model_loader: - kv 6: general.size_label str = 1B-7B
llama_model_loader: - kv 7: general.license str = apache-2.0
llama_model_loader: - kv 8: general.base_model.count u32 = 1
llama_model_loader: - kv 9: general.base_model.0.name str = OLMoE 1B 7B 0924 SFT
llama_model_loader: - kv 10: general.base_model.0.version str = 0924
llama_model_loader: - kv 11: general.base_model.0.organization str = Allenai
llama_model_loader: - kv 12: general.base_model.0.repo_url str = https://huggingface.co/allenai/OLMoE-...
llama_model_loader: - kv 13: general.tags arr[str,3] = ["moe", "olmo", "olmoe"]
llama_model_loader: - kv 14: general.languages arr[str,1] = ["en"]
llama_model_loader: - kv 15: general.datasets arr[str,1] = ["allenai/ultrafeedback_binarized_cle...
llama_model_loader: - kv 16: olmoe.block_count u32 = 16
llama_model_loader: - kv 17: olmoe.context_length u32 = 4096
llama_model_loader: - kv 18: olmoe.embedding_length u32 = 2048
llama_model_loader: - kv 19: olmoe.feed_forward_length u32 = 1024
llama_model_loader: - kv 20: olmoe.attention.head_count u32 = 16
llama_model_loader: - kv 21: olmoe.attention.head_count_kv u32 = 16
llama_model_loader: - kv 22: olmoe.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 23: olmoe.expert_used_count u32 = 8
llama_model_loader: - kv 24: general.file_type u32 = 18
llama_model_loader: - kv 25: olmoe.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 26: olmoe.expert_count u32 = 64
llama_model_loader: - kv 27: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 28: tokenizer.ggml.pre str = olmo
llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,50304] = ["|||IP_ADDRESS|||", "<|padding|>", "...
llama_model_loader: - kv 30: tokenizer.ggml.token_type arr[i32,50304] = [4, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 31: tokenizer.ggml.merges arr[str,50009] = ["Ġ Ġ", "Ġ t", "Ġ a", "h e", "i n...
llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 50279
llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 50279
llama_model_loader: - kv 34: tokenizer.ggml.padding_token_id u32 = 50280
llama_model_loader: - kv 35: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 36: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 37: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv 38: general.quantization_version u32 = 2
llama_model_loader: - kv 39: quantize.imatrix.file str = /models_out/OLMoE-1B-7B-0924-Instruct...
llama_model_loader: - kv 40: quantize.imatrix.dataset str = /training_dir/calibration_datav3.txt
llama_model_loader: - kv 41: quantize.imatrix.entries_count i32 = 128
llama_model_loader: - kv 42: quantize.imatrix.chunks_count i32 = 132
llama_model_loader: - type f32: 81 tensors
llama_model_loader: - type q6_K: 114 tensors
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'olmoe'
llama_load_model_from_file: failed to load model

latest release of llama (as of 2024-09-27)

Which quant are you trying to load? I don't think Llama.cpp support IQmatrix. I was able to load Q4_K_M successfully.

Sign up or log in to comment