arxiv:2402.01831

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Published on Feb 2

· Submitted by

akhaliq on Feb 6

Authors:

Zhifeng Kong ,

,

,

Wei Ping ,

,

Abstract

Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) strong multi-turn dialogue abilities. We introduce a series of training techniques, architecture design, and data strategies to enhance our model with these abilities. Extensive evaluations across various audio understanding tasks confirm the efficacy of our method, setting new state-of-the-art benchmarks.

View arXiv page View PDF Add to collection

Community

Feb 6

Neat, would have liked to see metrics on transcription and diarization, or sentiment classification too

Feb 7

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

pulkitmehtawork

Feb 13

I wonder if this can be used for speaker diarization .

pulkitmehtawork

Feb 14

Anyone has idea on when it will be opensourced ?

·

Jul 16

https://github.com/NVIDIA/audio-flamingo

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.01831 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.01831 in a Space README.md to link it from this page.

Collections including this paper 5