karpathy/gpt2_1558M_final4_hf

Can you provide a functional pytorch model.py and train.py that supports at least inference mode of this model using the hyperparameters in the config.json. There is no way to do model architecture research or educational experimentation on "autotransformer" which conceals the actual python scripts. Further, Huggingface sometimes vandalizes the hidden autotransformer model.py making it inoperable. For example the SmolLM2 weights are already useless for research because of a change in "autotransformers" that makes them have a size mismatch in the projections (k and v) apparently because of 3x re-use of k and v projection matrices. See e.g., https://huggingface.co/HuggingFaceTB/SmolLM2-360M/discussions The published SmolLM2 weights have already become unusable and unstudyable without providing a fixed and definite model.py and train.py to document how to implement the unusual config hyperparameters. I believe you have published various python scripts and C++ for operating GPT2 models elsewhere. I am surprised to see these weights published without including explicit model.py and train.py. Please help the independent research community by providing working pytorch model.py and train.py.

karpathy
/

gpt2_1558M_final4_hf

GPT2_model.py