Full list of languages
Where can we find a list of the languages used during training? And will there be a paper to be released?
Thank you for the model and for the help!
We trained our model on multiple languages including ar, bg, ca, cs, da, de, el, en, es, et, fa, fi, fr, hi, hu, id, it, iw, ja, kk, ko, lt, lv, mr, ms, nl, no, pl, pt, ro, ru, sk, sl, sr, sv, ta, th, tr, uk, vi, and zh. Notably, the primary languages in our training data were Chinese and English. As for your inquiry about a paper, we currently do not have plans to release one. Thanks.
How about programming languages?
We have included the following programming languages in our pre-training data: ABAP, Arduino, Assembly, Shell, C, C#, C++, Clojure, COBOL, Crystal, CUDA, Dart, Pascal, Elixir, Erlang, F#, Fortran, Go, Groovy, Haskell, CSS, Java, JavaScript, Julia, Kotlin, Common Lisp, Emacs Lisp, Objective-C++, OCaml, Perl, PHP, PowerShell, Python, R, Ruby, Rust, Scala, Solidity, SQL, Swift, TypeScript, Verilog, VHDL, Visual Basic.
What is the proportion of each language?ar, bg, ca, cs, da, de, el, en, es, et, fa, fi, fr, hi, hu, id, it, iw, ja, kk, ko, lt, lv, mr, ms, nl, no, pl, pt, ro, ru, sk, sl, sr, sv, ta, th, tr, uk, vi, and zh
Any test results on cross language tasks?