利用BLEU进行机器翻译检测(Python-NLTK-BLEU评分方法)--688IT编程网

利⽤BLEU进⾏机器翻译检测（Python-NLTK-BLEU评分⽅法）

双语评估替换分数（简称BLEU）是⼀种对⽣成语句进⾏评估的指标。完美匹配的得分为1.0，⽽完全不匹配则得分为0.0。这种评分标准是为了评估⾃动机器翻译系统的预测结果

⽽开发的，具备了以下⼀些优点：

1. 计算速度快，计算成本低。

2. 容易理解。

3. 与具体语⾔⽆关。

4. 已被⼴泛采⽤。

BLEU评分是由Kishore Papineni等⼈在他们2002年的论⽂BLEU a Method for Automatic Evaluation of Machine Translation中提出的。BLEU计算的原理是计算待评价译⽂和⼀

个或多个参考译⽂间的距离。距离是⽂本间n元相似度的平均，n=1,2,3（更⾼的值似乎⽆关紧要）。也就是说，如果待选译⽂和参考译⽂的2元（连续词对）或3元相似度较⾼，

那么该译⽂的得分就较⾼。

reference group

我们是翻译众包业务，对于我们的应⽤场景，如何得知译员是否有参考机器翻译引擎就成了⼀个⽐较重要的问题。我提出的基本思路是：

1. 在多个翻译⽹站上翻译原⽂，得到⼀组机器翻译评测集，以下的例⼦中就是⼀段原⽂通过百度、有道翻译之后，组织了⼀个机器翻译评测集

2. 将译员翻译出来的译⽂，作为待评测数据，计算其与机器翻译评测集的BLEU值（使⽤NLTK中提供的BLEU评分⽅法）

3. 值越⾼，表明匹配度越⾼，则译员参考机器翻译或者直接拷贝机器翻译的可能性就越⾼，此时需要项⽬经理介⼊。

以下是⽰例：

1、原⽂

新译星将代表四达时代集团在展览会上闪亮登场，届时我们将从新译星所开展的业务、具备的优势、成功案例等多个维度进⾏介绍，让您更加全⾯的了解新译星。我们拥有稳定的全职国际化团队，能够确保守时、⾼效的完成翻译和配⾳，并通过⾄臻 2、⼈⼯翻译

New Transtar will present itself at the Exhibition on behalf of StarTimes, and we will give a comprehensive introduction of ourselves, including the current services we offer, the advantages we hold, and the projects we have completed, to help yo 　3、百度翻译

The new translator will stand on the exhibition on behalf of the four times group at the exhibition. We will introduce the new star's business, the advantages and the successful cases, so that you can understand the new translator more comprehe 　4、有道翻译

The new translator star will represent sida times group in the exhibition, when we will introduce the new translator star's business, advantages, successful cases and other dimensions, so that you can have a more comprehensive understanding o 　5、⽤百度翻译和有道翻译组织机器翻译评测集

[['The', 'new', 'translator', 'will', 'stand', 'on', 'the', 'exhibition', 'on', 'behalf', 'of', 'the', 'four', 'times', 'group', 'at', 'the', 'exhibition', 'We', 'will', 'introduce', 'the', 'new', 'star`s', 'business', 'the', 'advantages', 'and', 'the', 'successful', 'cases', 'so', 'that

6、⽤⼈⼯翻译组织待检测数据

['New', 'Transtar', 'will', 'present', 'itself', 'at', 'the', 'Exhibition', 'on', 'behalf', 'of', 'StarTimes', 'and', 'we',

'will', 'give', 'a', 'comprehensive', 'introduction', 'of', 'ourselves', 'including', 'the', 'current', 'services', 'we', 'offer', 'the', 'advantages', 'we',

7、⾸先测试⼈⼯翻译产出的译⽂与机器翻译评测集之间的BLEU值，得到结果为0.119115465241，如下

[root@host-10-0-251-156 ~]# python

Python 2.7.5 (default, Apr 112018, 07:36:10)

[GCC 4.8.520150623 (Red Hat 4.8.5-28)] on linux2

Type "help", "copyright", "credits" or "license"for more information.

>>> anslate.bleu_score import sentence_bleu

>>>

>>> reference=[['The', 'new', 'translator', 'will', 'stand', 'on', 'the', 'exhibition', 'on', 'behalf', 'of', 'the', 'four', 'times', 'group', 'at', 'the', 'exhibition', 'We', 'will', 'introduce', 'the', 'new', 'star`s', 'business', 'the', 'advantages', 'and', 'the', 'successful'

>>>

>>> candidate=['New', 'Transtar', 'will', 'present', 'itself', 'at', 'the', 'Exhibition', 'on', 'behalf', 'of', 'StarTimes', 'and', 'we', 'will', 'give', 'a', 'comprehensive', 'introduction', 'of', 'ourselves', 'including', 'the', 'current', 'services', 'we', 'offer', 'the', 'advantages >>>

>>> score = sentence_bleu(reference, candidate)

>>> print score

0.119115465241

>>>

8、其次我们稍微改动以下百度翻译出来的译⽂，并测试其与机器翻译评测集之间的BLEU值，得到结果0.875629670466，如下：

8.1稍微改动之后的百度翻译

New Transtar will stand on the exhibition on behalf of the four times group at the exhibition. We will in

troduce the new star's business, the advantages and the successful cases, so that you can understand the new translator more comprehensive 　8.2⽤改动之后的百度翻译作为待评测数据

['New', 'Transtar', 'will', 'stand', 'on', 'the', 'exhibition', 'on', 'behalf', 'of', 'the', 'four', 'times', 'group', 'at', 'the', 'exhibition', 'We', 'will', 'introduce', 'the', 'new', 'star`s', 'business', 'the', 'advantages', 'and', 'the', 'successful', 'cases', 'so', 'that', 'you

8.3BLEU计算

>>> candidate_baidu=['New', 'Transtar', 'will', 'stand', 'on', 'the', 'exhibition', 'on', 'behalf', 'of', 'the', 'four', 'times', 'group', 'at', 'the', 'exhibition', 'We', 'will', 'introduce', 'the', 'new', 'star`s', 'business', 'the', 'advantages', 'and', 'the', 'successful',

>>> score_baidu = sentence_bleu(reference, candidate_baidu)

>>> print score_baidu

0.875629670466

>>>

9、由上⾯⽰例可看到，当待评测译⽂⾮常接近（也就是说该译员参考了机器翻译或直接进⾏的拷贝）机器翻译评测集中的数据时，BLEU值会升⾼。不过⾄于⾼到什么程度

才需要项⽬经理介⼊，这就需要在实际项⽬中不断的摸索了。

688IT编程网

利用BLEU进行机器翻译检测(Python-NLTK-BLEU评分方法)

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

688IT编程网

利用BLEU进行机器翻译检测(Python-NLTK-BLEU评分方法)

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时 正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

java正则表达式选择题

非零金额正则表达式

半小时正则表达式