Papers - Tokenizers - Bytes - Incremental Patching Note: BPE does not handle incremental patching like BLT Collection by matlok 2 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Papers - Tokenizers- Bytes - Entropy Patching - Threshold Helps with finding the end of the byte patch Collection by matlok 3 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Papers - Tokenizers - Bytes - Space - First Char - Patch Len Collection by matlok 3 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Papers - Tokenizers - Bytes - Patches - Space Detection Collection by matlok 3 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Papers - Tokenizers - Bytes - Patches - Entropy-based Patch start detected by entropy crossing a threshold Collection by matlok 3 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Papers - Tokenizers - Bytes - Strided Patches - MegaByte Collection by matlok 3 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Papers - Text - Tokenizer - Bytes - Strided Patches Collection by matlok 3 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Papers - Training Research - Bytes - No Vocabulary Collection by matlok 3 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Papers - Training - Activation Function - SwiGLU Collection by matlok 2 days ago - Qwen2.5 Technical Report Paper • 2412.15115 • Published 8 days ago • 328 Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Architectures Collection by admarcosai 6 days ago - Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76 Large Action Models: From Inception to Implementation Paper • 2412.10047 • Published 14 days ago • 29
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 14 days ago • 76
Large Action Models: From Inception to Implementation Paper • 2412.10047 • Published 14 days ago • 29