Fix bugs for bias (#3)

Browse files

- Fix bugs for bias (ffd4f7c02c8029728f7479c1163163362ffcf11d)

Files changed (2) hide show

README.md +20 -15
model.safetensors +1 -1

README.md CHANGED Viewed

@@ -10,6 +10,11 @@ language:
 The **RetrievaBERT** is the pre-trained Transformer Encoder using Megatron-LM.
 It is designed for use in Japanese.
 ## Model Details
 ### Model Description
@@ -19,12 +24,12 @@ The **RetrievaBERT** is the pre-trained Transformer Encoder using Megatron-LM.
 It is designed for use in Japanese.
 This model offers several advanced features compared to traditional BERT models:
-- **PreNorm**: Improved stability during training.
-- **SwiGLU**: Enhanced activation function for better performance.
-- **Grouped-Query Attention (Multi-Query Attention)**: Efficient attention mechanism.
-- **Max Sequence Length**: 2048 tokens, allowing for longer context.
-- **Parameters**: 1.3 billion parameters.
-- **Pre-training Objective**: Only Masked Language Modeling (MLM), not Next Sentence Prediction (NSP).
 - **Token Type IDs**: Not used in this model.
 ### Model Sources
@@ -44,9 +49,9 @@ Depending on your use case, follow the appropriate section below.
 This model is pre-trained using Masked Language Modeling.
 The mask token used is `<MASK|LLM-jp>`.
-Note that you need to set `trust_remote_code` to `True` because RetrievaBERT uses a custom model implementation.
-Example code for direct use:
 ```python
 from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
@@ -98,7 +103,7 @@ The model was trained on the following hyperparameters.
 - Floating point expression: BF16
 ## Evaluation
-We fine-tuned the following models and evaluated them on the [JGLUE](https://github.com/yahoojapan/JGLUE) development set.
 We adjusted the learning rate and training epochs for each model and task in accordance with [the JGLUE paper](https://www.jstage.jst.go.jp/article/jnlp/30/1/30_63/_pdf/-char/ja).
 | Model                            | MARC-ja/acc | JSTS/pearson | JSTS/spearman | JNLI/acc | JSQuAD/EM | JSQuAD/F1 | JComQA/acc |
@@ -106,7 +111,7 @@ We adjusted the learning rate and training epochs for each model and task in acc
 | tohoku-nlp/bert-base-japanese-v3 | 0.957       | 0.914        | 0.876         | 0.906    | 0.878     | 0.946     | 0.849      |
 | tohoku-nlp/bert-large-japanese-v2| 0.959       | 0.916        | 0.877         | 0.901    | 0.884     | 0.951     | 0.867      |
 | ku-nlp/deberta-v3-base-japanese　　　　| 0.958       | 0.925        | 0.890         | 0.902    | 0.925     | 0.910     | 0.882      |
-| retrieva-jp/bert-1.3b　　　　　　　　　　　　　　　　　　　　　　　　| 0.952       | 0.916        | 0.877         | 0.896    | 0.916     | 0.879     | 0.815      |
 ## Technical Specifications
@@ -121,9 +126,9 @@ The RetrievaBERT model is based on BERT with the following hyperparameters:
 - Maximum length of position embeddings: 2048
 As mentioned earlier, the main differences from the original BERT are:
-- PreNorm: Improved stability during training.
-- SwiGLU: Enhanced activation function for better performance.
-- Grouped-Query Attention (Multi-Query Attention): Efficient attention mechanism.
 ### Compute Infrastructure
@@ -145,4 +150,4 @@ https://note.com/retrieva/n/n715bea2c2cd1 (in Japanese)
 Satoru Katsumata, Daisuke Kimura, Jiro Nishitoba
 ## Model Card Contact
-pr@retrieva.jp

 The **RetrievaBERT** is the pre-trained Transformer Encoder using Megatron-LM.
 It is designed for use in Japanese.
+## What's New
+- November 2024 (`v1.0.1`): Bug fix for the model parameters.
+  - The up_proj's bias was initialized with the gate's one. This bug was fixed.
 ## Model Details
 ### Model Description
 It is designed for use in Japanese.
 This model offers several advanced features compared to traditional BERT models:
+- **PreNorm**: Improved stability during training.
+- **SwiGLU**: Enhanced activation function for better performance.
+- **Grouped-Query Attention (Multi-Query Attention)**: Efficient attention mechanism.
+- **Max Sequence Length**: 2048 tokens, allowing for longer context.
+- **Parameters**: 1.3 billion parameters.
+- **Pre-training Objective**: Only Masked Language Modeling (MLM), not Next Sentence Prediction (NSP).
 - **Token Type IDs**: Not used in this model.
 ### Model Sources
 This model is pre-trained using Masked Language Modeling.
 The mask token used is `<MASK|LLM-jp>`.
+Note that you need to set `trust_remote_code` to `True` because RetrievaBERT uses a custom model implementation.
+Example code for direct use:
 ```python
 from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
 - Floating point expression: BF16
 ## Evaluation
+We fine-tuned the following models and evaluated them on the [JGLUE](https://github.com/yahoojapan/JGLUE) development set.
 We adjusted the learning rate and training epochs for each model and task in accordance with [the JGLUE paper](https://www.jstage.jst.go.jp/article/jnlp/30/1/30_63/_pdf/-char/ja).
 | Model                            | MARC-ja/acc | JSTS/pearson | JSTS/spearman | JNLI/acc | JSQuAD/EM | JSQuAD/F1 | JComQA/acc |
 | tohoku-nlp/bert-base-japanese-v3 | 0.957       | 0.914        | 0.876         | 0.906    | 0.878     | 0.946     | 0.849      |
 | tohoku-nlp/bert-large-japanese-v2| 0.959       | 0.916        | 0.877         | 0.901    | 0.884     | 0.951     | 0.867      |
 | ku-nlp/deberta-v3-base-japanese　　　　| 0.958       | 0.925        | 0.890         | 0.902    | 0.925     | 0.910     | 0.882      |
+| retrieva-jp/bert-1.3b　　　　　　　　　　　　　　　　　　　　　　　　| 0.959       | 0.917        | 0.881         | 0.898    | 0.875     | 0.874     | 0.827      |
 ## Technical Specifications
 - Maximum length of position embeddings: 2048
 As mentioned earlier, the main differences from the original BERT are:
+- PreNorm: Improved stability during training.
+- SwiGLU: Enhanced activation function for better performance.
+- Grouped-Query Attention (Multi-Query Attention): Efficient attention mechanism.
 ### Compute Infrastructure
 Satoru Katsumata, Daisuke Kimura, Jiro Nishitoba
 ## Model Card Contact
+pr@retrieva.jp

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c42e82e5fd0d4fd37e4b158b8669abfc465c5d16483e3e63ffa2fd7616592ad7
 size 2602880000

 version https://git-lfs.github.com/spec/v1
+oid sha256:994bd099f4bb0c9bab36ed16e1a8271f46f637de6b06e32fa1f29643d7b528c9
 size 2602880000