https://docs.google.com/document/d/1cCe7GE2L8IrCpl2tzTRuZzlkz-Iu5AscGILGI0JioxY https://wandb.ai/jordantensor/gemma-sandbagging
Jordan Taylor
JordanTensor
·
AI & ML interests
Mechanistic interpretability, mechanistic anomaly detection, model internals techniques and AI safety techniques generally.
Recent Activity
updated
a collection
16 days ago
Sandbagging research sprint 1
updated
a collection
16 days ago
Sandbagging research sprint 1
updated
a collection
16 days ago
Sandbagging research sprint 1
Organizations
Collections
1
models
44
JordanTensor/gemma-sandbagging-mzpd84pf-step1984
Updated
JordanTensor/gemma-sandbagging-mzpd84pf-step1968
Updated
JordanTensor/gemma-sandbagging-mzpd84pf-step1952
Updated
JordanTensor/gemma-sandbagging-mzpd84pf-step1936
Updated
JordanTensor/gemma-sandbagging-mzpd84pf-step800
Updated
JordanTensor/gemma-sandbagging-mzpd84pf-step400
Updated
JordanTensor/gemma-sandbagging-mzpd84pf-step384
Updated
JordanTensor/gemma-sandbagging-mzpd84pf-step368
Updated
JordanTensor/gemma-sandbagging-mzpd84pf-step352
Updated
JordanTensor/gemma-sandbagging-mzpd84pf-step336
Updated