Spaces:

Infi-MM
/

README

Running

App Files Files Community

README / README.md

xiaotianhan

Update README.md

743b1c4 verified 3 months ago

preview code

raw

history blame

5.19 kB

	---
	title: README
	emoji: 🚀
	colorFrom: indigo
	colorTo: pink
	sdk: static
	pinned: false
	---

	<br>
	<p align="center">
	<img src="https://huggingface.co/Infi-MM/infimm-zephyr/resolve/main/assets/infimm-logo.webp" alt="InfiMM-logo" width="400"></a>
	</p>
	<br>

	<h1 align="center">
	InfiMM
	</h1>

	<p align="center">
	<a href="https://github.com/InfiMM/">
	<img src="data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0Ij48cGF0aCBkPSJNMTIgMGMtNi42MjYgMC0xMiA1LjM3My0xMiAxMiAwIDUuMzAyIDMuNDM4IDkuOCA4LjIwNyAxMS4zODcuNTk5LjExMS43OTMtLjI2MS43OTMtLjU3N3YtMi4yMzRjLTMuMzM4LjcyNi00LjAzMy0xLjQxNi00LjAzMy0xLjQxNi0uNTQ2LTEuMzg3LTEuMzMzLTEuNzU2LTEuMzMzLTEuNzU2LTEuMDg5LS43NDUuMDgzLS43MjkuMDgzLS43MjkgMS4yMDUuMDg0IDEuODM5IDEuMjM3IDEuODM5IDEuMjM3IDEuMDcgMS44MzQgMi44MDcgMS4zMDQgMy40OTIuOTk3LjEwNy0uNzc1LjQxOC0xLjMwNS43NjItMS42MDQtMi42NjUtLjMwNS01LjQ2Ny0xLjMzNC01LjQ2Ny01LjkzMSAwLTEuMzExLjQ2OS0yLjM4MSAxLjIzNi0zLjIyMS0uMTI0LS4zMDMtLjUzNS0xLjUyNC4xMTctMy4xNzYgMCAwIDEuMDA4LS4zMjIgMy4zMDEgMS4yMy45NTctLjI2NiAxLjk4My0uMzk5IDMuMDAzLS40MDQgMS4wMi4wMDUgMi4wNDcuMTM4IDMuMDA2LjQwNCAyLjI5MS0xLjU1MiAzLjI5Ny0xLjIzIDMuMjk3LTEuMjMuNjUzIDEuNjUzLjI0MiAyLjg3NC4xMTggMy4xNzYuNzcuODQgMS4yMzUgMS45MTEgMS4yMzUgMy4yMjEgMCA0LjYwOS0yLjgwNyA1LjYyNC01LjQ3OSA1LjkyMS40My4zNzIuODIzIDEuMTAyLjgyMyAyLjIyMnYzLjI5M2MwIC4zMTkuMTkyLjY5NC44MDEuNTc2IDQuNzY1LTEuNTg5IDguMTk5LTYuMDg2IDguMTk5LTExLjM4NiAwLTYuNjI3LTUuMzczLTEyLTEyLTEyeiIvPjwvc3ZnPg==">
	</a>
	</p>


	InfiMM, inspired by the Flamingo architecture, sets itself apart with unique training data and diverse large language models (LLMs). This approach allows InfiMM to maintain the core strengths of Flamingo while offering enhanced capabilities. As the premier open-sourced variant in this domain, InfiMM excels in accessibility and adaptability, driven by community collaboration. It's more than an emulation of Flamingo; it's an innovation in visual language processing.

	Our model is another attempt to produce the result reported in the paper "Flamingo: A Large-scale Visual Language Model for Multimodal Understanding" by DeepMind.
	Compared with previous open-sourced attempts ([OpenFlamingo](https://github.com/mlfoundations/open_flamingo) and [IDEFIC](https://huggingface.co/blog/idefics)), InfiMM offers a more flexible models, allowing for a wide range of applications.
	In particular, InfiMM integrates the latest LLM models into VLM domain the reveals the impact of LLMs with different scales and architectures.

	Please note that InfiMM is currently in beta stage and we are continuously working on improving it.

	## News
	- 🎉 [2024.08.15] Our paper was accepted by ACL 2023 [InfiMM](https://aclanthology.org/2024.findings-acl.27/).
	- 🎉 [2024.03.02] We release the [InfiMM-HD](https://huggingface.co/Infi-MM/infimm-hd).
	- 🎉 [2024.01.11] We release the first set of MLLMs we trained [InfiMM-Zephyr](https://huggingface.co/Infi-MM/infimm-zephyr), [InfiMM-LLaMA13B](https://huggingface.co/Infi-MM/infimm-llama13b) and [InfiMM-Vicuna13B](https://huggingface.co/Infi-MM/infimm-vicuna13b).
	- 🎉 [2024.01.10] We release a survey about Multimodal Large Language Models (MLLMs) reasoning capability at [here](https://huggingface.co/papers/2401.06805).
	- 🎉 [2023.11.18] We release InfiMM-Eval at [here](https://arxiv.org/abs/2311.11567), an Open-ended VQA benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks. The leaderboard can be found via [Papers with Code](https://paperswithcode.com/sota/visual-question-answering-vqa-on-core-mm) or [project page](https://infimm.github.io/InfiMM-Eval/).

	## Citation
	```
	@inproceedings{liu-etal-2024-infimm,
	title = "{I}nfi{MM}: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model",
	author = "Liu, Haogeng and
	You, Quanzeng and
	Wang, Yiqi and
	Han, Xiaotian and
	Zhai, Bohan and
	Liu, Yongfei and
	Chen, Wentao and
	Jian, Yiren and
	Tao, Yunzhe and
	Yuan, Jianbo and
	He, Ran and
	Yang, Hongxia",
	editor = "Ku, Lun-Wei and
	Martins, Andre and
	Srikumar, Vivek",
	booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
	month = aug,
	year = "2024",
	address = "Bangkok, Thailand and virtual meeting",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2024.findings-acl.27",
	pages = "485--492",
	abstract = "In this work, we present InfiMM, an advanced Multimodal Large Language Model that adapts to intricate vision-language tasks. InfiMM, inspired by the Flamingo architecture, distinguishes itself through the utilization of large-scale training data, comprehensive training strategies, and diverse large language models. This approach ensures the preservation of Flamingo{'}s foundational strengths while simultaneously introducing augmented capabilities. Empirical evaluations across a variety of benchmarks underscore InfiMM{'}s remarkable capability in multimodal understanding. The code can be found at: https://anonymous.4open.science/r/infimm-zephyr-F60C/.",
	}

	```