Jordan Taylor's picture

30

Jordan Taylor

JordanTensor

·

https://sites.google.com/view/jordantensor

AI & ML interests

Mechanistic interpretability, mechanistic anomaly detection, model internals techniques and AI safety techniques generally.

Recent Activity

updated a collection 16 days ago

Sandbagging research sprint 1

updated a collection 16 days ago

Sandbagging research sprint 1

updated a collection 16 days ago

Sandbagging research sprint 1

View all activity

Organizations

Collections 1

models 44

JordanTensor/gemma-sandbagging-mzpd84pf-step1984

Updated 17 days ago

JordanTensor/gemma-sandbagging-mzpd84pf-step1968

Updated 17 days ago

JordanTensor/gemma-sandbagging-mzpd84pf-step1952

Updated 17 days ago

JordanTensor/gemma-sandbagging-mzpd84pf-step1936

Updated 17 days ago

JordanTensor/gemma-sandbagging-mzpd84pf-step800

Updated 18 days ago

JordanTensor/gemma-sandbagging-mzpd84pf-step400

Updated 18 days ago

JordanTensor/gemma-sandbagging-mzpd84pf-step384

Updated 18 days ago

JordanTensor/gemma-sandbagging-mzpd84pf-step368

Updated 18 days ago

JordanTensor/gemma-sandbagging-mzpd84pf-step352

Updated 18 days ago

JordanTensor/gemma-sandbagging-mzpd84pf-step336

Updated 18 days ago

datasets 3

JordanTensor/sandbagging-sciq

Viewer • Updated 22 days ago • 13.7k • 103 • 1

JordanTensor/sandbagging-prefixes

Viewer • Updated 22 days ago • 9.9k • 93 • 1

JordanTensor/bias_in_bios_verified_software_devs_only

Viewer • Updated Oct 9 • 5.9k • 35