Model Card for yolochess_mlm_azure-cloud-35

This model with 66M parameters is pre-trained from scratch with Masked Language Modeling on Chess Positions in FEN format.
It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves.

Model Details

Model Description

  • Developed by: Jonathan Rahn
  • Model type: Distilbert
  • Language(s) (NLP): Chess FEN
  • License: MIT

Uses

Direct Use

This model is pre-trained from scratch with Masked Language Modeling on Chess Positions in FEN format.

Downstream Use

It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves.

Out-of-Scope Use

Anything other than Chess Positions in standard FEN format.

Bias, Risks, and Limitations

n/a

Recommendations

n/a

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35")
model = AutoModelForMaskedLM.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35")
from transformers import pipeline
pipe = pipeline("fill-mask", "jrahn/yolochess_mlm_azure-cloud-35")
pipe("6k1/8/8/1pB3[MASK]P/1P3P2/8/8/8 w - - 1 74")

Training Details

Training Data

Lichess-Elite 22-11 Dataset

Training Procedure

Masked Language Modeling objective with 15% masked token ratio.

Preprocessing

Tokenize data["train"]["fen"] with max-length padding to 200 tokens with default distilbert-base-cased tokenizer. Inefficient: Most of the vocab is never observed in FEN, wasting embedding parameters. The sequence length / pos embedding size of model and sequence length of data preprocessing leads to lots of padding and wasted parameters. FENs should be shorter than 90 characters. Experiments with reduced max-length in tokenization show performance gains.

Speeds, Sizes, Times

Training for 172500 steps at batch-size 128 (22M examples, 1 epoch) took ~10 hrs on 1x RTX 4090, using 20GB VRAM, with final MLM-loss: 0.2567.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: 1x RTX 4090
  • Hours used: 10
  • Cloud Provider: local
  • Compute Region: local
  • Carbon Emitted: 1.5kg

Technical Specifications

Model Architecture and Objective

Distilbert, Masked Language Modeling

Downloads last month
5
Safetensors
Model size
65.8M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train jrahn/yolochess_mlm_azure-cloud-35