Update README.md
Browse files
README.md
CHANGED
@@ -1,15 +1,9 @@
|
|
1 |
---
|
2 |
title: CodeBLEU
|
3 |
-
datasets:
|
4 |
-
-
|
5 |
tags:
|
6 |
- evaluate
|
7 |
- metric
|
8 |
-
description: "
|
9 |
-
sdk: gradio
|
10 |
-
sdk_version: 3.0.2
|
11 |
-
app_file: app.py
|
12 |
-
pinned: false
|
13 |
---
|
14 |
|
15 |
# Metric Card for CodeBLEU
|
@@ -17,34 +11,49 @@ pinned: false
|
|
17 |
***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
|
18 |
|
19 |
## Metric Description
|
20 |
-
|
21 |
|
22 |
## How to Use
|
23 |
-
*
|
24 |
-
|
25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
### Inputs
|
28 |
*List all input arguments in the format below*
|
29 |
-
- **
|
|
|
|
|
|
|
30 |
|
31 |
### Output Values
|
32 |
|
33 |
-
*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
|
34 |
|
35 |
-
*State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
|
36 |
|
37 |
#### Values from Popular Papers
|
38 |
-
*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
|
39 |
|
40 |
-
### Examples
|
41 |
-
*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
|
42 |
|
43 |
## Limitations and Bias
|
44 |
-
*Note any known limitations or biases that the metric has, with links and references if possible.*
|
45 |
|
46 |
-
## Citation
|
47 |
-
*Cite the source where this metric was introduced.*
|
48 |
|
49 |
-
##
|
50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
title: CodeBLEU
|
|
|
|
|
3 |
tags:
|
4 |
- evaluate
|
5 |
- metric
|
6 |
+
description: "CodeBLEU metric for Python and C++"
|
|
|
|
|
|
|
|
|
7 |
---
|
8 |
|
9 |
# Metric Card for CodeBLEU
|
|
|
11 |
***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
|
12 |
|
13 |
## Metric Description
|
14 |
+
CodeBLEU metric is used on code synthesis not only consider the surface match similar with the original BLEU, but can also consider the grammatical correctness and the logic correctness, leveraging the abstract syntax tree and the data-flow structure.
|
15 |
|
16 |
## How to Use
|
17 |
+
* clone the repository
|
18 |
+
```python
|
19 |
+
git clone https://huggingface.co/spaces/giulio98/codebleu.git
|
20 |
+
```
|
21 |
+
* import metric
|
22 |
+
```python
|
23 |
+
from codebleu.calc_code_bleu import calculate
|
24 |
+
```
|
25 |
+
* compute score
|
26 |
+
```python
|
27 |
+
true_codes = [["def hello_world():\n print("hello world!")"], ["def add(a,b)\n return a+b"]]
|
28 |
+
code_gens = ["def hello_world():\n print("hello world!")", "def add(a,b)\n return a+b"]
|
29 |
+
codebleu = calculate(references=true_codes, predictions=code_gens, language="python", alpha=0.25, beta=0.25, gamma=0.25, theta=0.25)
|
30 |
+
print(codebleu['code_bleu_score'])
|
31 |
+
```
|
32 |
|
33 |
### Inputs
|
34 |
*List all input arguments in the format below*
|
35 |
+
- **references** *(list of list of string): contains n possible solutions for each problem*
|
36 |
+
- **predictions** *(list of string): contains a single prediction for each problem*
|
37 |
+
- **language** *(string): python or cpp*
|
38 |
+
|
39 |
|
40 |
### Output Values
|
41 |
|
|
|
42 |
|
|
|
43 |
|
44 |
#### Values from Popular Papers
|
|
|
45 |
|
|
|
|
|
46 |
|
47 |
## Limitations and Bias
|
|
|
48 |
|
|
|
|
|
49 |
|
50 |
+
## Citation
|
51 |
+
```
|
52 |
+
@unknown{unknown,
|
53 |
+
author = {Ren, Shuo and Guo, Daya and Lu, Shuai and Zhou, Long and Liu, Shujie and Tang, Duyu and Zhou, Ming and Blanco, Ambrosio and Ma, Shuai},
|
54 |
+
year = {2020},
|
55 |
+
month = {09},
|
56 |
+
pages = {},
|
57 |
+
title = {CodeBLEU: a Method for Automatic Evaluation of Code Synthesis}
|
58 |
+
}
|
59 |
+
```
|