Fundamental Research

Norm 's Collections

updated Sep 16

Scaling Law with Learning Rate Annealing

Paper • 2408.11029 • Published Aug 20 • 3
Token Turing Machines

Paper • 2211.09119 • Published Nov 16, 2022 • 1

Note 1. The result of memory “read” is fed to the processing unit; The output from the processing unit is “written” to the memory. 2. Token summarisation: implemented as a weighted summation of all context in memory. R_{k x p} * R_{p x d} = R{k x d}; Make R_{k x p} learnable. 3. Add positional embedding to distinguish tokens from memory vs. tokens from inputs.
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Paper • 2203.12602 • Published Mar 23, 2022

Note 1. It has an upgrade version: https://arxiv.org/pdf/2303.16727 1.1. Progressive fine-tuning of the pre-trained models can contribute to higher performance. 1.2. Decoder takes inputs from the encoder visible tokens and only reconstructs the visible tokens under the decoder mask. 1.3 The supervision only applies to the decoder output tokens invisible to the encoder.
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

Paper • 2305.13035 • Published May 22, 2023