Are the output CLS and token embedding vectors L2 normalized on a per-token basis?
· Sign up or log in to comment