Nick Doiron's picture

Nick Doiron

monsoon-nlp

·

AI & ML interests

biology and multilingual models

Recent Activity

replied to their post about 22 hours ago

Great to see Tatta Bio release an embeddings version of their DNA/protein language model 🧬: https://huggingface.co/tattabio/gLM2_650M_embed

Reacted to MohamedRashad's post with 🚀 about 23 hours ago

A while back i shared this model https://huggingface.co/MohamedRashad/arabic-small-nougat that was a finetune from https://huggingface.co/facebook/nougat-small for the Arabic Language. Today this humble project has been scaled with new models, new datasets, new space, and a new paper Check everything throught this collection here: https://huggingface.co/collections/MohamedRashad/arabic-nougat-673a3f540bd92904c9b92a8e

upvoted a paper 3 days ago

Farmer.Chat: Scaling AI-Powered Agricultural Services for Smallholder Farmers

View all activity

Articles

Abliterating Refusal and Code LLMs

Starting Tiny with Protein LLaMA

Protein similarity and Matryoshka embeddings

Trying IDEFICS on a New Yorker cartoon dataset

Organizations

monsoon-nlp's activity

upvoted a paper 3 days ago

Farmer.Chat: Scaling AI-Powered Agricultural Services for Smallholder Farmers

Paper • 2409.08916 • Published Sep 13 • 3

upvoted a collection 12 days ago

Plant foundation models

A collection of pre-trained DNA models for plant genomes. • 19 items • Updated Oct 23 • 4

upvoted a collection 13 days ago

Malaysian synthetic dataset

Use LLM to generate Malaysian context synthetic dataset. • 31 items • Updated 15 days ago • 1

upvoted a paper 13 days ago

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Paper • 2411.07781 • Published 18 days ago • 1

upvoted a paper about 1 month ago

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Paper • 2410.20771 • Published Oct 28 • 2

upvoted a collection about 1 month ago

Llama 3.2

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Oct 24 • 512

upvoted an article 3 months ago

Article

Building DoRA Support for Embedding Layers in PEFT

By

•

Aug 23

• 10

upvoted a paper 3 months ago

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20 • 41

upvoted a paper 4 months ago

Refusal in Language Models Is Mediated by a Single Direction

Paper • 2406.11717 • Published Jun 17 • 2

upvoted a collection 5 months ago

PlantCaduceus (512bp len)

https://plantcaduceus.github.io • 8 items • Updated Sep 7 • 2

upvoted a paper 5 months ago

Larimar: Large Language Models with Episodic Memory Control

Paper • 2403.11901 • Published Mar 18 • 32

upvoted a collection 5 months ago

Cambrian-1 Models

6 items • Updated Jun 28 • 20

upvoted an article 5 months ago

Article

Recommendation to Revisit the Diffuser Default LoRA Parameters

By

•

Jun 21

• 11

upvoted a paper 5 months ago

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Paper • 2406.10209 • Published Jun 14 • 8

upvoted a collection 5 months ago

Florence

9 items • Updated Jul 11 • 160

upvoted a paper 6 months ago

A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding

Paper • 2406.05540 • Published Jun 8 • 3

upvoted 2 articles 6 months ago

Article

An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct

By

•

Jun 11

• 48

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

May 28

• 161

upvoted 2 collections 6 months ago

RU-MTEB

18 items • Updated Jun 5 • 8

abliterated-v3

Latest gen of the abliterated models I've produced • 17 items • Updated Jun 3 • 96