Added model evaluation results

6f754e8 verified 12 days ago

5.12 kB

	---
	library_name: transformers
	tags:
	- Compiler
	- LLVM
	- Intermediate Representation
	- IR
	- Path
	- Hot Path
	datasets:
	- zhaojer/compiler_hot_paths
	language:
	- en
	base_model:
	- google-bert/bert-base-uncased
	---

	# Model Card for BERT Hot Path Predictor

	<!-- Provide a quick summary of what the model is/does. -->

	## Model Details

	### Model Description

	This BERT model performs hot path prediction: Given a path (i.e. a sequence of LLVM IR instructions), predict whether it is "hot" (1) or "cold" (0).
	It was fine-tuned on the [hot paths dataset](https://huggingface.co/datasets/zhaojer/compiler_hot_paths) for 3 epochs with standard learning hyperparameters.

	- Model type: Binary Sequence Classification
	- Language(s) (NLP): English, Compiler/LLVM
	- Finetuned from model: google-bert/bert-base-uncased
	- Dataset used: zhaojer/compiler_hot_paths

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	The model can be used to predict whether a path is hot or cold, which is important information for compiler optimizations. Here is an instance of the prediction pipeline:
	1. Given a program (written in C, C++, Fortran, or other languages supported by LLVM), compile it into LLVM IR (e.g., `clang -S -emit-llvm program.c -o program.ll`)
	2. Select a sequence of instructions (in the unit of basic blocks) from the IR file; use this as the input to the model.
	3. Load the present model and feed it the selected input, the model will then output either 0 (cold path) or 1 (hot path).

	The model can be further fine-tuned using additional data. Please see zhaojer/compiler_hot_paths dataset card for more information on the expected data used for fine-tuning.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```
	from transformers import BertForSequenceClassification, BertTokenizer, pipeline

	# Load saved model
	saved_model = BertForSequenceClassification.from_pretrained("zhaojer/bert-hot-path-predictor")
	saved_tokenizer = BertTokenizer.from_pretrained("zhaojer/bert-hot-path-predictor")

	# Pipeline for predictions
	classifier = pipeline("text-classification", model=saved_model, tokenizer=saved_tokenizer)

	# Example prediction
	new_path = "%26 = load i32, ptr %21, align 4\n%27 = load i32, ptr %11, align\n%28 = icmp slt i32 %26, %27\nbr i1 %28, label %29, label %59\n\nstore i32 0, ptr %22, align 4\nbr label %30"
	prediction = classifier(new_path)
	print(prediction)
	```

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
	The model was fine-tuned on the hot paths dataset: [zhaojer/compiler_hot_paths](https://huggingface.co/datasets/zhaojer/compiler_hot_paths)

	The dataset is already split into train, validation, test sets with necessary columns/data needed for training/fine-tuning. No further preprocessing was performed for the data.

	The data (in the `path` column) were tokenized using the standard `BertTokenizer` for the `bert-base-uncased` model.

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	We defined accuracy and AUROC as evaluation metrics for the model.

	The model was fine-tuned for 3 epochs with standard hyperparameters, which took about 10 minutes to complete using NVIDIA T4 GPU.

	#### Detailed Training Hyperparameters

	- `evaluation_strategy="epoch"`
	- `logging_strategy="epoch"`
	- `save_strategy="epoch"`
	- `num_train_epochs=3`
	- `per_device_train_batch_size=16`
	- `per_device_eval_batch_size=16`
	- `learning_rate=5e-5`
	- `load_best_model_at_end=True`
	- `metric_for_best_model="accuracy"`

	Note: Anything not explicitly stated used default value.

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data

	<!-- This should link to a Dataset Card if possible. -->
	The testing data consist of 68 hot paths and 92 cold paths generated from 4 distinct C programs.
	They are also from [zhaojer/compiler_hot_paths](https://huggingface.co/datasets/zhaojer/compiler_hot_paths); please see its dataset card for how the testing data were created.
	The model had never seen these testing data previously.

	### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->
	We evaluated the model on the testing data using the following metrics:
	- Loss (available by default)
	- Accuracy
	- AUROC
	- Precision, Recall, F1 score
	- Confusion matrix

	### Results

	\| Loss \| Accuracy \| AUROC \| Precision \| Recall \| F1 \|
	\| ---- \| -------- \| ----- \| --------- \| ------ \| ---- \|
	\| 0.0620 \| 0.9875 \| 0.9952\| 1.0000 \| 0.9706 \| 0.99 \|

	\| \| Actually Hot \| Actually Cold \|
	\| ------------- \| ----------- \| ------------ \|
	\| Predicted Hot \| 66 \| 0 \|
	\| Predicted Cold\| 2 \| 92 \|