22 1 4

Morgan Funtowicz

mfuntowicz

https://github.com/mfuntowicz

AI & ML interests

Model inference low-level optimization, hardware affinity and large-scale distributed training.

Recent Activity

updated a model 22 days ago

mfuntowicz/SmolLM2-360M-Instruct-Q4_K_M-GGUF

updated a model 22 days ago

mfuntowicz/SmolLM2-360M-Instruct-Q4_K_M-GGUF

updated a model 22 days ago

mfuntowicz/SmolLM2-360M-Instruct-Q4_K_M-GGUF

View all activity

Articles

Organizations

Collections 1

Papers 1

arxiv:1910.03771

models 6

mfuntowicz/SmolLM2-360M-Instruct-Q4_K_M-GGUF

Updated 22 days ago • 35

mfuntowicz/gemma-2b

Text Generation • Updated Feb 29 • 12 • 1

mfuntowicz/xlm-roberta-large-squad2

Updated Jul 11, 2023

mfuntowicz/wav2vec2-large-960h-lv60-self

Updated Feb 27, 2023

mfuntowicz/bert-base-cased-finetuned-sst2

Text Classification • Updated May 19, 2021 • 23

mfuntowicz/test-model

Updated Nov 10, 2020

datasets

None public yet

Morgan Funtowicz

AI & ML interests

Recent Activity

Articles

Introducing the AMD 5th Gen EPYC™ CPU

Hugging Face on AMD Instinct MI300 GPU

CPU Optimized Embeddings with 🤗 Optimum Intel and fastRAG

Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive

AMD + 🤗: Large Language Models Out-of-the-Box Acceleration with AMD GPU

Optimum-NVIDIA - Unlock blazingly fast LLM inference in just 1 line of code

Accelerating over 130,000 Hugging Face models with ONNX Runtime

Case Study: Millisecond Latency using Hugging Face Infinity and modern CPUs

Scaling up BERT-like model Inference on modern CPU - Part 2

Introducing Optimum: The Optimization Toolkit for Transformers at Scale

Scaling-up BERT Inference on CPU (Part 1)

Organizations

Collections 1

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Papers 1

models 6

mfuntowicz/SmolLM2-360M-Instruct-Q4_K_M-GGUF

mfuntowicz/gemma-2b

mfuntowicz/xlm-roberta-large-squad2

mfuntowicz/wav2vec2-large-960h-lv60-self

mfuntowicz/bert-base-cased-finetuned-sst2

mfuntowicz/test-model

datasets

Morgan Funtowicz

AI & ML interests

Recent Activity

Articles

Introducing the AMD 5th Gen EPYC™ CPU

Hugging Face on AMD Instinct MI300 GPU

CPU Optimized Embeddings with 🤗 Optimum Intel and fastRAG

Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive

AMD + 🤗: Large Language Models Out-of-the-Box Acceleration with AMD GPU

Optimum-NVIDIA - Unlock blazingly fast LLM inference in just 1 line of code

Accelerating over 130,000 Hugging Face models with ONNX Runtime

Case Study: Millisecond Latency using Hugging Face Infinity and modern CPUs

Scaling up BERT-like model Inference on modern CPU - Part 2

Introducing Optimum: The Optimization Toolkit for Transformers at Scale

Scaling-up BERT Inference on CPU (Part 1)

Organizations

Collections 1

Papers 1

models 6 Sort: Recently updated

datasets

models 6