Xenova/gemini-nano

Jul 4

Not again ;-)

Does this mean Gemini nano can run without MediaPipe, through Transformers.js only?

If so, does it run with CPU, GPU, or both?

And does Transformers.js allow for the loading of lora extensions? I was toying with it because I was interested in how this experiment enabled that: https://www.reddit.com/r/LocalLLaMA/comments/1dsfpb4/gemini_nano_running_locally_in_brave_using/

Xenova

Owner Jul 4

Does this mean Gemini nano can run without MediaPipe, through Transformers.js only?

That's a goal, but for now this repo will only "signal" to the browser to use the window.ai functionality, if present.

If so, does it run with CPU, GPU, or both?

It will run on GPU

And does Transformers.js allow for the loading of lora extensions?

Not currently - this is a limitation of ONNX (/ ONNX Runtime Web), so feel free to open feature requests there! :)

ethanc8

Jul 4

Would my script, which converts the MediaPipe format Gemini Nano to fp32 safetensors, be helpful? https://github.com/ethanc8/Gemini-Nano/blob/master/playground/converter.py

I haven't really tested it, since it takes more than 2 hours to finish dequantizing, and runs out of memory while it tries to save to safetensors. I'm trying various mitigations to get around this.

Xenova

Owner Jul 4

That is indeed very useful! If you can get a gemma model running with those weights, I can convert to ONNX and get it running with transformers.js!

BoscoTheDog

Jul 5

•

edited Jul 5

@ethanc8 Cool!

I tried running the script, but got an error:

python3 convert_gemini.py weights.bin gemini_nano.safetensors fp16

model: tflite.Model.Model = tflite.Model.Model.GetRootAs(buf)

I changed that to model: tflite.Model = tflite.Model.GetRootAs(buf) and got a bit further:

return packer_type.unpack_from(memoryview_type(buf), head)[0]
struct.error: unpack_from requires a buffer of at least 1802465126 bytes for unpacking 4 bytes at offset 1802465122 (actual buffer size is 824)

Which means I have ridiculously little memory available I take it? :-D

ethanc8

Jul 5

@BoscoTheDog You need to enter the conda environment and use converter.py. Also, tflite.Model is a module, not a class (it's located in playground/tflite/Model.py), so we need to use tflite.Model.Model. Finally, the fact that your buffer size is 824 means that you opened an 824-byte file instead of the Gemini Nano weights. Check what's actually inside weights.bin.

ethanc8

Jul 6

I am now running the dequantization at https://github.com/ethanc8/Gemini-Nano/actions. I kept running out of memory on my host machine, but hopefully GitHub Actions' 16GB RAM should allow the dequantization to finish successfully.

ethanc8

Jul 6

Do we have much of any knowledge about what it'd take to restore multimodal support to this model? I assume that they're using a ViT-VQGAN for their image decoder (the other ways I know about to use transformers for image generation use dVAE, VQVAE, or VQGAN, and the only image gen research they cited in the architecture paragraph was OpenAI DALL-E using dVAE and Google Parti using ViT-VQGAN), and I'd hope that the input tokens and output tokens are from the same vocabulary, so the image encoder should also be a ViT-VQGAN. They mentioned that they used a Google USM for the speech encoder. It might be useful if we could get the model to generate image tokens. I'm also thinking of trying to restore the image output on Meta Chameleon, which should be much easier because they released the VQGAN, so I think they must've just fine-tuned the model to avoid generating images, after giving it the ability to generate images. Maybe the LoRA adapter which ships with Gemini Nano does something similar, so maybe running the model without the LoRA adapter might cause it to generate image tokens if you prompt it to. I'm really not sure though.

QuietImpostor

Jul 9

This comment has been hidden

Benchmark	Gemma 2B	Sappha 2B	Gemini Nano
MMLU (5sh)	36.98	38.02	???
HellaSwag (0sh)	49.22	51.70	???
PIQA (1sh)	75.08	75.46	???
TruthfulQA (0sh)	37.51	31.65	???

Oh man