julien-c HF staff commited on
Commit
7225d34
1 Parent(s): 045a4bf

Add description to card metadata

Browse files

BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another.
Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation,
the better it is" – this is the central idea behind BLEU. BLEU was one of the first metrics to claim a high correlation with human judgements of quality, and
remains one of the most popular automated and inexpensive metrics.

Scores are calculated for individual translated segments—generally sentences—by comparing them with a set of good quality reference translations.
Those scores are then averaged over the whole corpus to reach an estimate of the translation's overall quality. Intelligibility or grammatical correctness
are not taken into account[citation needed].

BLEU's output is always a number between 0 and 1. This value indicates how similar the candidate text is to the reference texts, with values closer to 1
representing more similar texts. Few human translations will attain a score of 1, since this would indicate that the candidate is identical to one of the
reference translations. For this reason, it is not necessary to attain a score of 1. Because there are more opportunities to match, adding additional
reference translations will increase the BLEU score.

Files changed (1) hide show
  1. README.md +38 -4
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: BLEU
3
- emoji: 🤗
4
  colorFrom: blue
5
  colorTo: red
6
  sdk: gradio
@@ -8,10 +8,44 @@ sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
  tags:
11
- - evaluate
12
- - metric
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  # Metric Card for BLEU
16
 
17
 
 
1
  ---
2
  title: BLEU
3
+ emoji: 🤗
4
  colorFrom: blue
5
  colorTo: red
6
  sdk: gradio
 
8
  app_file: app.py
9
  pinned: false
10
  tags:
11
+ - evaluate
12
+ - metric
13
+ description: >-
14
+ BLEU (bilingual evaluation understudy) is an algorithm for evaluating the
15
+ quality of text which has been machine-translated from one natural language to
16
+ another.
17
+
18
+ Quality is considered to be the correspondence between a machine's output and
19
+ that of a human: "the closer a machine translation is to a professional human
20
+ translation,
21
+
22
+ the better it is" – this is the central idea behind BLEU. BLEU was one of the
23
+ first metrics to claim a high correlation with human judgements of quality,
24
+ and
25
+
26
+ remains one of the most popular automated and inexpensive metrics.
27
+
28
+
29
+ Scores are calculated for individual translated segments—generally
30
+ sentences—by comparing them with a set of good quality reference translations.
31
+
32
+ Those scores are then averaged over the whole corpus to reach an estimate of
33
+ the translation's overall quality. Intelligibility or grammatical correctness
34
 
35
+ are not taken into account[citation needed].
36
+
37
+
38
+ BLEU's output is always a number between 0 and 1. This value indicates how
39
+ similar the candidate text is to the reference texts, with values closer to 1
40
+
41
+ representing more similar texts. Few human translations will attain a score of
42
+ 1, since this would indicate that the candidate is identical to one of the
43
+
44
+ reference translations. For this reason, it is not necessary to attain a score
45
+ of 1. Because there are more opportunities to match, adding additional
46
+
47
+ reference translations will increase the BLEU score.
48
+ ---
49
  # Metric Card for BLEU
50
 
51