How much ram it needs?

#31

by Turbina99 - opened May 5, 2023

Discussion

Turbina99

May 5, 2023

How much ram it needs? my 24GB was not enought
looks like it is being run on CPU

RazielAU

May 5, 2023

•

edited May 5, 2023

24GB should be more than enough for a 6B model... I run the Pygmalion 7B model in full BF16 precision on my 16GB 4080. If it's running on the CPU then it's more likely that you haven't installed one of the required libraries or something. I would suggest using Oobabooga installed via their installation scripts, here is a link to the Windows version: https://github.com/oobabooga/text-generation-webui/releases/download/installers/oobabooga_windows.zip

The benefit of using their setup script is that it will install everything you need for your hardware. Also, if you tried using the GPU and the memory was not enough, it would likely just die and not work at all, I don't think it would magically switch to CPU mode without you telling it to, so it sounds more like something's not set up for it to use the GPU...

Turbina99

May 5, 2023

well I tired to run it in pycharm using:

from transformers import pipeline

text_generation = pipeline("text-generation", model="PygmalionAI/pygmalion-6b")
generated_text = text_generation("Hello, how are you?")
print(generated_text[0]['generated_text'])

Turbina99

May 5, 2023

on option with GPU, also does not work:
from transformers import pipeline

text_generation = pipeline("text-generation",
model="PygmalionAI/pygmalion-6b",
device=0) # specify the GPU device number

generated_text = text_generation("Hello, how are you?")
print(generated_text[0]['generated_text'])

and the RAM memroy 24GB is full

RazielAU

May 5, 2023

•

edited May 5, 2023

Oh, you're trying to do this in code? I'll pass this over to someone else to support, my suggestion is to start with oobabooga as it has both example code and installs all the libraries you need. As I said earlier, GPU support requires a whole bunch of extra libraries, it's not going to work if you don't have them installed. If your hardware doesn't support 16bit, then you might have to load it in 8 bit mode. Again, check ooba for code/requirement examples. Also, do remember that oobabooga provides a Kobold compatible API and a new streaming text API, so you can connect to it via API and use it that way as well.

Turbina99

May 5, 2023

Yes i tried to load it in pycharm, becase i wanted to attach some logic

AlecKarfonta

May 11, 2023

I experienced the same thing. It takes 26-27gb so just barely too big for the gpu. The precision would help but I'm not sure where to set that.

AlecKarfonta

May 13, 2023

Found you can set the precision by calling model.half()
Need to also call that at the end of your input tensor.

Something to note your cpu might not support all the fp16 operations so if you use this is will likely only run on the gpu now. So just make sure to call model.cuda()

Kirisuma

Aug 18, 2023

its running fine on my 8gb 3060ti. Sure response time is somewhat between 15 to 25 seconds. But I can life with that.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment