25 1

Dattu Sharma

imdatta0

https://datta0.github.io/

AI & ML interests

Everything ML. Specifically Deep Learning.

Recent Activity

updated a model 4 days ago

imdatta0/pints_paged_adamw_32bit_warmup0.02

updated a dataset 6 days ago

imdatta0/wikipedia_en_sample

New activity 9 days ago

imdatta0/wikipedia_en_sample:Librarian Bot: Add language metadata for dataset

View all activity

Organizations

imdatta0's activity

New activity in imdatta0/wikipedia_en_sample 9 days ago

Librarian Bot: Add language metadata for dataset

#2 opened 25 days ago by

librarian-bot

commented 3 papers 9 days ago

commented 2 papers 23 days ago

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

Paper • 2410.07145 • Published Oct 9 • 2 •

Round and Round We Go! What makes Rotary Positional Encodings useful?

Paper • 2410.06205 • Published Oct 8 • 1 •

commented 4 papers about 1 month ago

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1 • 29 •

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8 • 107 •

Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 166 •

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1 • 29 •

commented a paper 2 months ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 136 •

commented 7 papers 3 months ago

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27 • 37 •

KTO: Model Alignment as Prospect Theoretic Optimization

Paper • 2402.01306 • Published Feb 2 • 15 •

Planning In Natural Language Improves LLM Search For Code Generation

Paper • 2409.03733 • Published Sep 5 •

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3 • 77 •

FocusLLM: Scaling LLM's Context by Parallel Decoding

Paper • 2408.11745 • Published Aug 21 • 23 •

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22 • 30 •

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21 • 55 •

New activity in imdatta0/pints 3 months ago

Librarian Bot: Add language metadata for dataset

#1 opened 3 months ago by

librarian-bot

New activity in mistralai/Mistral-7B-Instruct-v0.3 4 months ago

Add tool calling support to chat template

#68 opened 4 months ago by

Rocketknight1