zhaojer's picture
Added model evaluation results
6f754e8 verified
---
library_name: transformers
tags:
- Compiler
- LLVM
- Intermediate Representation
- IR
- Path
- Hot Path
datasets:
- zhaojer/compiler_hot_paths
language:
- en
base_model:
- google-bert/bert-base-uncased
---
# Model Card for BERT Hot Path Predictor
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
This BERT model performs hot path prediction: Given a path (i.e. a sequence of LLVM IR instructions), predict whether it is "hot" (1) or "cold" (0).
It was fine-tuned on the [hot paths dataset](https://huggingface.co/datasets/zhaojer/compiler_hot_paths) for 3 epochs with standard learning hyperparameters.
- **Model type:** Binary Sequence Classification
- **Language(s) (NLP):** English, Compiler/LLVM
- **Finetuned from model:** google-bert/bert-base-uncased
- **Dataset used:** zhaojer/compiler_hot_paths
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
The model can be used to predict whether a path is hot or cold, which is important information for compiler optimizations. Here is an instance of the prediction pipeline:
1. Given a program (written in C, C++, Fortran, or other languages supported by LLVM), compile it into LLVM IR (e.g., `clang -S -emit-llvm program.c -o program.ll`)
2. Select a sequence of instructions (in the unit of basic blocks) from the IR file; use this as the input to the model.
3. Load the present model and feed it the selected input, the model will then output either 0 (cold path) or 1 (hot path).
The model can be further fine-tuned using additional data. Please see zhaojer/compiler_hot_paths dataset card for more information on the expected data used for fine-tuning.
## How to Get Started with the Model
Use the code below to get started with the model.
```
from transformers import BertForSequenceClassification, BertTokenizer, pipeline
# Load saved model
saved_model = BertForSequenceClassification.from_pretrained("zhaojer/bert-hot-path-predictor")
saved_tokenizer = BertTokenizer.from_pretrained("zhaojer/bert-hot-path-predictor")
# Pipeline for predictions
classifier = pipeline("text-classification", model=saved_model, tokenizer=saved_tokenizer)
# Example prediction
new_path = "%26 = load i32, ptr %21, align 4\n%27 = load i32, ptr %11, align\n%28 = icmp slt i32 %26, %27\nbr i1 %28, label %29, label %59\n\nstore i32 0, ptr %22, align 4\nbr label %30"
prediction = classifier(new_path)
print(prediction)
```
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
The model was fine-tuned on the hot paths dataset: [zhaojer/compiler_hot_paths](https://huggingface.co/datasets/zhaojer/compiler_hot_paths)
The dataset is already split into train, validation, test sets with necessary columns/data needed for training/fine-tuning. No further preprocessing was performed for the data.
The data (in the `path` column) were tokenized using the standard `BertTokenizer` for the `bert-base-uncased` model.
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
We defined accuracy and AUROC as evaluation metrics for the model.
The model was fine-tuned for 3 epochs with standard hyperparameters, which took about 10 minutes to complete using NVIDIA T4 GPU.
#### Detailed Training Hyperparameters
- `evaluation_strategy="epoch"`
- `logging_strategy="epoch"`
- `save_strategy="epoch"`
- `num_train_epochs=3`
- `per_device_train_batch_size=16`
- `per_device_eval_batch_size=16`
- `learning_rate=5e-5`
- `load_best_model_at_end=True`
- `metric_for_best_model="accuracy"`
Note: Anything not explicitly stated used default value.
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data
<!-- This should link to a Dataset Card if possible. -->
The testing data consist of 68 hot paths and 92 cold paths generated from 4 distinct C programs.
They are also from [zhaojer/compiler_hot_paths](https://huggingface.co/datasets/zhaojer/compiler_hot_paths); please see its dataset card for how the testing data were created.
The model had never seen these testing data previously.
### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
We evaluated the model on the testing data using the following metrics:
- Loss (available by default)
- Accuracy
- AUROC
- Precision, Recall, F1 score
- Confusion matrix
### Results
| Loss | Accuracy | AUROC | Precision | Recall | F1 |
| ---- | -------- | ----- | --------- | ------ | ---- |
| 0.0620 | 0.9875 | 0.9952| 1.0000 | 0.9706 | 0.99 |
| | Actually Hot | Actually Cold |
| ------------- | ----------- | ------------ |
| Predicted Hot | 66 | 0 |
| Predicted Cold| 2 | 92 |