--- license: cc-by-nc-4.0 --- A llama.c model based on Karpathy's Llama2.c project. https://github.com/karpathy/llama2.c Vocab of 4096, trained on Tinystories, and my custom littlestories dataset (currently unreleased.) Model uses ↨ as a shift key, instead of using capial letters, this allowed simplification of the tokenizer to avoid duplicates that are uppercase. --- To convert normal text to the right format I use: ``` def add_caseifer(text): # Using list comprehension for more efficient concatenation return ''.join(['↨' + char.lower() if char.isupper() else char for char in text]) ``` To return the text to human format I use: ``` def remove_caseifer(text): new_text = "" i = 0 while i < len(text): if text[i] == "↨": if i+1 < len(text): new_text += text[i+1].upper() i += 1 else: pass # skip this index else: new_text += text[i] i += 1 return new_text ```