Datasets and models for EMNLP paper "Scalable Data Ablation Approximations for Language Models through Modular Training and Merging"
Clara Na
claran
AI & ML interests
None yet
Recent Activity
authored
a paper
about 1 month ago
Scalable Data Ablation Approximations for Language Models through
Modular Training and Merging
updated
a dataset
about 1 month ago
claran/modular-s2orc
updated
a collection
about 1 month ago
Scalable Data Ablations
Organizations
Collections
1
Papers
1
models
30
claran/s2orc-biology1994-1999-ind-130m
Updated
•
1
claran/s2orc-biology2007-2008-ind-130m
Updated
claran/s2orc-biology2013-2013-ind-130m
Updated
•
1
claran/s2orc-biology2021-2021-ind-130m
Updated
•
3
claran/s2orc-biology2019-2019-ind-130m
Updated
•
1
claran/s2orc-biology2000-2003-ind-130m
Updated
claran/s2orc-biology2015-2015-ind-130m
Updated
•
3
claran/s2orc-biology2014-2014-ind-130m
Updated
•
3
claran/s2orc-biology2004-2006-ind-130m
Updated
•
2
claran/s2orc-biology2016-2016-ind-130m
Updated
•
3
datasets
6
claran/modular-s2orc
Viewer
•
Updated
•
7.47M
•
912
•
1
claran/seed-pretrain-decon
Viewer
•
Updated
•
3.45M
•
69
claran/m2d2-wiki-decon
Viewer
•
Updated
•
5.3M
•
68
claran/seed-pretrain-decon-parquet
Viewer
•
Updated
•
6.61M
•
102
claran/m2d2-wiki-decon-parquet
Viewer
•
Updated
•
10.6M
•
2.33k
claran/modular-s2orc-parquet
Viewer
•
Updated
•
7.47M
•
3.25k