Text Generation
Transformers
PyTorch
TeleFLM
custom_code
jasonfang3900 commited on
Commit
80dbf03
1 Parent(s): bb5e7ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -2,10 +2,11 @@
2
  license: apache-2.0
3
  ---
4
 
5
- # Tele-FLM
6
  Tele-FLM-1T (aka FLM-2-1T) is a 1T open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgement capabilities.
7
- Built upon the decoder-only transformer architecture, it has been trained on approximately 2T tokens.
8
- Tele-FLM series demonstrate superior performances at its scale, and sometimes surpass larger models.
 
9
  In addition to sharing the model weights, we provide the core designs, engineering practices, and training details, anticipating their benefits for both academic and industrial communities.
10
 
11
  ## Model Details
@@ -38,7 +39,7 @@ Based on growth technology, the Tele-FLM-1T model training is divided into three
38
  - Input and output multiplier
39
 
40
  Consequently, Tele-FLM-1T is largely compatible with Llama architecturally.
41
- To maximize convenience for the community, we made minimal adjustments to Llama's code to adapt it to Tele-FLM and released it as open source.
42
 
43
 
44
  | Models | layer<br>number | attention<br>heads | hidden<br>size | ffn hidden<br>size | vocab<br>size | context<br>length | params<br>count |
@@ -56,8 +57,8 @@ All nodes are interconnected via InfiniBand (IB). The training process lasted ar
56
 
57
  ### Software
58
 
59
- Tele-FLM utilizes 3D parallel training, combining the prevailing methodologies: data parallelism, tensor parallelism, and pipeline parallelism.
60
- The parallel training setup for Tele-FLM is configured as follows: tensor parallel=32, pipeline parallel=28, and data parallel=1.
61
 
62
  ### Relate Work
63
  [Tele-FLM (52B)](https://huggingface.co/CofeAI/Tele-FLM)
 
2
  license: apache-2.0
3
  ---
4
 
5
+ # Tele-FLM-1T
6
  Tele-FLM-1T (aka FLM-2-1T) is a 1T open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgement capabilities.
7
+ Built upon the decoder-only transformer architecture, it has been trained on approximately 2.3T tokens.
8
+ Tele-FLM-1T, currently the largest size among Tele-FLM series, is build upon Tele-FLM (52B) with superior performances at its scale, is capable of dealing with even harder tasks with better performances in all likelihood.
9
+ For now, it's still under evaluation due to limited computing resouces.
10
  In addition to sharing the model weights, we provide the core designs, engineering practices, and training details, anticipating their benefits for both academic and industrial communities.
11
 
12
  ## Model Details
 
39
  - Input and output multiplier
40
 
41
  Consequently, Tele-FLM-1T is largely compatible with Llama architecturally.
42
+ To maximize convenience for the community, we made minimal adjustments to Llama's code to adapt it to Tele-FLM-1T and released it as open source.
43
 
44
 
45
  | Models | layer<br>number | attention<br>heads | hidden<br>size | ffn hidden<br>size | vocab<br>size | context<br>length | params<br>count |
 
57
 
58
  ### Software
59
 
60
+ Tele-FLM-1T utilizes 3D parallel training, combining the prevailing methodologies: data parallelism, tensor parallelism, and pipeline parallelism.
61
+ The parallel training setup for Tele-FLM-1T is configured as follows: tensor parallel=32, pipeline parallel=28, and data parallel=1.
62
 
63
  ### Relate Work
64
  [Tele-FLM (52B)](https://huggingface.co/CofeAI/Tele-FLM)