loiccabannes
commited on
Commit
•
0188bf6
1
Parent(s):
542ceb6
Update README.md
Browse files
README.md
CHANGED
@@ -12,16 +12,7 @@ pipeline_tag: question-answering
|
|
12 |
**MambaSan-instruct is the first chat Japanese language model based on a state-space model architecture (Mamba), not a transformer.**
|
13 |
|
14 |
The model is based on Albert Gu's and Tri Dao's work *Mamba: Linear-Time Sequence Modeling with Selective State Spaces* ([paper](https://arxiv.org/pdf/2312.00752.pdf)) as well as their [model implementation](https://github.com/state-spaces/mamba).
|
15 |
-
This work was also inspired by heavenq's mamba-chat implementation in English
|
16 |
-
bibtex
|
17 |
-
@misc{haven2023mambachat,
|
18 |
-
title = {Mamba-Chat},
|
19 |
-
author = {Justus Mattern and Konstantin Hohr},
|
20 |
-
year = {2023},
|
21 |
-
howpublished = {GitHub},
|
22 |
-
url = {https://github.com/havenhq/mamba-chat}
|
23 |
-
}
|
24 |
-
This repository provides training / fine-tuning code for the model based on some modifications of the Huggingface Trainer class.
|
25 |
|
26 |
Mamba-Chat is based on MambaSan-130m and was fine-tuned on 31,7k examples samples of the [SkelterLabsInc/JaQuAD](https://huggingface.co/datasets/SkelterLabsInc/JaQuAD) dataset. To learn more, you can:
|
27 |
|
|
|
12 |
**MambaSan-instruct is the first chat Japanese language model based on a state-space model architecture (Mamba), not a transformer.**
|
13 |
|
14 |
The model is based on Albert Gu's and Tri Dao's work *Mamba: Linear-Time Sequence Modeling with Selective State Spaces* ([paper](https://arxiv.org/pdf/2312.00752.pdf)) as well as their [model implementation](https://github.com/state-spaces/mamba).
|
15 |
+
This work was also inspired by heavenq's mamba-chat implementation in English.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
Mamba-Chat is based on MambaSan-130m and was fine-tuned on 31,7k examples samples of the [SkelterLabsInc/JaQuAD](https://huggingface.co/datasets/SkelterLabsInc/JaQuAD) dataset. To learn more, you can:
|
18 |
|