flacuna-13b-v1.0 / README.md

Update README.md

f3f4095 over 1 year ago

5.91 kB

	---
	license: other
	library_name: transformers
	metrics:
	- accuracy
	- code_eval
	datasets:
	- declare-lab/InstructEvalImpact
	- declare-lab/flan-mini
	---

	# Flacuna: A Vicuna made of Flan

	[Paper](https://arxiv.org/abs//2307.02053) \| [Model](https://huggingface.co/declare-lab/flacuna-13b-v1.0) \| [Dataset](https://huggingface.co/datasets/declare-lab/flan-mini)

	📣 We still have numerous experiments awaiting completion (details are [here](https://arxiv.org/abs//2307.02053)), requiring additional computing resources in our lab. If any industry professionals reading this are willing to provide assistance, please feel free to reach out to us at sporia@sutd.edu.sg.

	<img src="https://declare-lab.net/assets/images/logos/flacuna5.png" alt="Image" width="200" height="335">

	Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is already an excellent writing assistant, and the intention behind Flacuna was to enhance Vicuna's problem-solving capabilities. To achieve this, we curated a dedicated instruction dataset called Flan-mini.

	\| Dataset Name \| Source \| Dataset Size \|
	\|-----------------------------\|------------------------\|--------------\|
	\| Flan2021 \| Flan \| 388K \|
	\| Public Pool of Prompts \| Flan \| 320K \|
	\| Natural instructions v2 \| Flan \| 200K \|
	\| CoT \| Flan \| 100K \|
	\| Code Search \| HF/code_search_net \| 100K \|
	\| Code Contest \| HF/deepmind/code_contests \| 50K \|
	\| Apps \| HF/codeparrot/apps \| 50K \|
	\| GPT4-Alpaca \| GPT-4 \| 52K \|
	\| Code-Alpaca \| ChatGPT \| 20K \|
	\| ShareGPT \| ChatGPT \| 60K \|
	\| Total \| - \| 1.34M \|

	## Problem Solving Ability

	As a result of this fine-tuning process, Flacuna exhibited notable performance improvements in problem-solving across multiple benchmark datasets, both in few-shot and zero-shot settings.

	\| Model \| Size \| MMLU (5-shot) \| BBH (3-shot) \| DROP (3-shot) \| CRASS (3-shot) \| HumanEval (0-shot) \| Avg. \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| StableVicuna \| 13B \| 49.2 (+3.0) \| 37.5 (+0.4) \| 34.3 (-1.0) \| 67.5 (+8.7) \| 15.9 (+2.5) \| 40.9 (+2.7) \|
	\| Vicuna \| 13B \| 50.6 (+4.5) \| 37.6 (+0.5) \| 32.6 (-3.0) \| 60.9 (+2.1) \| 11.6 (-1.8) \| 38.7 (+0.6) \|
	\| Flacuna \| 13B \| 51.1 (+5.0) \| 39.3 (+2.2) \| 43.6 (+8.0) \| 74.1 (+15.3) \| 11.0 (-2.4) \| 43.8 (+5.6) \|

	\| Model \| Size \| MMLU (0-shot) \| BBH (0-shot) \| CRASS (0-shot) \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| StableVicuna \| 13B \| 47.5 \| 18.5 \| 64.2 \|
	\| Vicuna \| 13B \| 48.3 \| 28.3 \| 65.7 \|
	\| Flacuna \| 13B \| 49.4 \| 32.5 \| 67.9 \|


	During training, Flacuna is a 13B checkpoint of LLaMA and employed a maximum input sequence length of 1280. We utilized LoRA for parameter-efficient fine-tuning.

	## Chatbot / Writing Assistant

	While Flacuna primarily excels in problem-solving tasks, we made efforts to maintain the impressive writing and chatting ability of Vicuna. To achieve this, we incorporated conversational datasets generated by GPT-4, such as GPT-4-Alpaca and ShareGPT, into the Flan-mini collection.
	To use Flacuna as a chatbot or writing assistant, we recommend you use the following template:

	```
	A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {definition of the task}.\n\n
	{question}\n
	Output: ASSISTANT:

	```
	Please note that we still recommend using Vicuna as your preferred Chatbot or Writing Assistant, over Flacuna. Flacuna's primary strength lies in problem-solving tasks, making it ideal for such applications.

	The following table presents the writing performance of Flacuna on the IMPACT dataset, which is a component of the InstructEval evaluation suite. The generated responses have been evaluated by ChatGPT, and their relevance and coherence have been scored on a scale of 1 to 5.


	\| Model \| Size \| Informative Rel. \| Informative Coh. \| Professional Rel. \| Professional Coh. \| Argumentative Rel. \| Argumentative Coh. \| Creative Rel. \| Creative Coh. \| Avg. Rel. \| Avg. Coh. \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| ChatGPT \| - \| 3.34 \| 3.98 \| 3.88 \| 3.96 \| 3.96 \| 3.82 \| 3.92 \| 3.94 \| 3.78 \| 3.93 \|
	\| Flan-Alpaca \| 11B \| 3.56 \| 3.46 \| 3.54 \| 3.70 \| 3.22 \| 3.28 \| 3.70 \| 3.40 \| 3.51 \| 3.46 \|
	\| Flan-T5 \| 11B \| 2.64 \| 3.24 \| 2.62 \| 3.22 \| 2.54 \| 3.40 \| 2.50 \| 2.72 \| 2.58 \| 3.15 \|
	\| Dolly-V2 \| 12B \| 3.54 \| 3.64 \| 2.96 \| 3.74 \| 3.66 \| 3.20 \| 3.02 \| 3.18 \| 3.30 \| 3.44 \|
	\| StableVicuna \| 13B \| 3.54 \| 3.64 \| 2.96 \| 3.74 \| 3.30 \| 3.20 \| 3.02 \| 3.18 \| 3.21 \| 3.44 \|
	\| Vicuna \| 13B \| 3.60 \| 3.96 \| 3.74 \| 3.82 \| 3.82 \| 3.56 \| 3.82 \| 3.92 \| 3.75 \| 3.82 \|
	\| Flacuna \| 13B \| 3.02 \| 3.42 \| 3.48 \| 3.52 \| 3.38 \| 3.02 \| 3.92 \| 3.80 \| 3.45 \| 3.44 \|


	## Basic Usage
	```bash
	git clone https://huggingface.co/declare-lab/flacuna-13b-v1.0
	cd flacuna-13b-v1.0
	python flacuna.py
	```
	## Training or Fine-tuning Flacuna

	The trainer codes are available here: [https://github.com/declare-lab/flacuna](https://github.com/declare-lab/flacuna).

	## License
	Non-commercial license

	## Citation

	```bibtex
	@misc{ghosal2023flacuna,
	title={Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning},
	author={Deepanway Ghosal and Yew Ken Chia and Navonil Majumder and Soujanya Poria},
	year={2023},
	eprint={2307.02053},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```