zhaoqf123's picture
add gradient checkpointing for the final_layernorm module.
3d854f8