30,142,848 trainable parameters.

Embedding parameters: 19,298,688

Non-embedding parameters: 10,844,160

Tokenizer: GPT-2

Vocabulary size: 50,257

Compute: single T4 GPU

Total train time: 2 hours and 40 minutes

Total train tokens: 136,000,000

Epochs: 2

Final train Loss: 2.9811

Final test Loss: 2.7963

try the following script for inference:

!pip install huggingface_hub !pip install transformers !pip install torch from transformers import GPT2Tokenizer, GPT2Config, GPT2LMHeadModel from huggingface_hub import hf_hub_download import torch

Name

model_name = 'Mizule/Dense-30M'

Authenticate

token = input("Enter your Hugging Face token: ")

Download

model_file = hf_hub_download(repo_id=f"{model_name}", filename="Dense-30M.pth", use_auth_token=token)

Custom config

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

config = GPT2Config( vocab_size=tokenizer.vocab_size, n_positions=512, n_ctx=512, n_embd=384, n_layer=6, n_head=8 )

Load model

model = GPT2LMHeadModel(config) model.load_state_dict(torch.load(model_file, map_location=torch.device('cpu'))) device = "cuda" if torch.cuda.is_available() else "cpu" model.to(device) model.eval()

Inference settings

def generate_text(prompt, max_length=128, temperature=0.2, top_k=50, top_p=0.9): inputs = tokenizer(prompt, return_tensors="pt") inputs = {key: value.to("cuda" if torch.cuda.is_available() else "cpu") for key, value in inputs.items()} outputs = model.generate( **inputs, max_length=max_length, temperature=temperature, top_k=top_k, top_p=top_p, num_return_sequences=1, no_repeat_ngram_size=2, do_sample=True, early_stopping=True ) return tokenizer.decode(outputs[0], skip_special_tokens=True)

Interactive loop (it's an undertrained base model, don't expect it to chat)

while True: prompt = input("Prompt: ") if prompt.lower() == 'exit': break output = generate_text(prompt) print(f"Generated text: {output}")

Mizule
/

Dense-30M