QizhiPei

QizhiPei

AI & ML interests

AI4Science, Natural Language Processing

Recent Activity

liked a model 10 days ago
insilicomedicine/nach0_base
liked a model 21 days ago
Qwen/Qwen2.5-7B
liked a dataset 30 days ago
HuggingFaceTB/smollm-corpus
View all activity

Organizations

None yet

QizhiPei's activity

liked a Space about 2 months ago
Reacted to RishabhBhardwaj's post with πŸ‘ 4 months ago
view post
Post
2428
πŸŽ‰ We are thrilled to share our work on model merging. We proposed a new approach, Della-merging, which combines expert models from various domains into a single, versatile model. Della employs a magnitude-based sampling approach to eliminate redundant delta parameters, reducing interference when merging homologous models (those fine-tuned from the same backbone).

Della outperforms existing homologous model merging techniques such as DARE and TIES. Across three expert models (LM, Math, Code) and their corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), Della achieves an improvement of 3.6 points over TIES and 1.2 points over DARE.

Paper: DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling (2406.11617)
Github: https://github.com/declare-lab/della

@soujanyaporia @Tej3
  • 3 replies
Β·
upvoted an article 4 months ago