Content

1 Calculating the BLEU Score in Python
2 1. Input and Split the Sentences
3 2. Calculate the BLEU Score in Python
4 3. Complete Code for Implementing BLEU Score in Python
5 4. Calculating the n-gram Score
6 Conclusion

Vijona

7 Feb at 9:57

How to Calculate BLEU Score in Python?

BLEU score in Python is a metric that measures the goodness of Machine Translation models. Though originally it was designed for only translation models, now it is used for other natural language processing applications as well. The BLEU score compares a sentence against one or more reference sentences and tells how well does the candidate sentence matched the list of reference sentences. It gives an output score between 0 and 1. A BLEU score of 1 means that the candidate sentence perfectly matches one of the reference sentences.This score is a common metric of measurement for Image captioning models.In this tutorial, we will be using sentence_bleu() function from the nltk library. Let’s get started.

Calculating the BLEU Score in Python

To calculate the BLEU score, we need to provide the reference and candidate sentences in the form of tokens.

We will learn how to do that and compute the score in this section. Let’s start with importing the necessary modules.

Copy Code


from nltk.translate.bleu_score import sentence_bleu

Now we can input the reference sentences in the form of a list. We also need to create tokens out of sentences before passing them to the sentence_bleu() function.

1. Input and Split the Sentences

The sentences in our reference list are:

Copy Code

'this is a dog' 'it is dog' 'dog it is' 'a dog, it is'

We can split them into tokens using the split function.

Copy Code



reference = [
    'this is a dog'.split(),
    'it is dog'.split(),
    'dog it is'.split(),
    'a dog, it is'.split() 
]
print(reference)

Output:

Copy Code



[['this', 'is', 'a', 'dog'], ['it', 'is', 'dog'], ['dog', 'it', 'is'], ['a', 'dog,', 'it', 'is']]

This is what the sentences look like in the form of tokens. Now we can call the sentence_bleu() function to calculate the score.

2. Calculate the BLEU Score in Python

To calculate the score use the following lines of code:

Copy Code



candidate = 'it is dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

Output:

Copy Code



BLEU score -> 1.0

We get a perfect score of 1 as the candidate sentence belongs to the reference set. Let’s try another one.

Copy Code



candidate = 'it is a dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

Output:

Copy Code



BLEU score -> 0.8408964152537145

We have the sentence in our reference set, but it isn’t an exact match. This is why we get a 0.84 score.

3. Complete Code for Implementing BLEU Score in Python

Copy Code



from nltk.translate.bleu_score import sentence_bleu
reference = [
    'this is a dog'.split(),
    'it is dog'.split(),
    'dog it is'.split(),
    'a dog, it is'.split() 
]
candidate = 'it is dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

candidate = 'it is a dog'.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

4. Calculating the n-gram Score

While matching sentences you can choose the number of words you want the model to match at once. For example, you can choose for words to be matched one at a time (1-gram). Alternatively, you can also choose to match words in pairs (2-gram) or triplets (3-grams).

In the sentence_bleu() function you can pass an argument with weights corresponding to the individual grams.

For example, to calculate gram scores individually you can use the following weights:

Individual 1-gram: (1, 0, 0, 0)
Individual 2-gram: (0, 1, 0, 0)
Individual 3-gram: (0, 0, 1, 0)
Individual 4-gram: (0, 0, 0, 1)

Python code for the same is given below:

Copy Code



from nltk.translate.bleu_score import sentence_bleu
reference = [
    'this is a dog'.split(),
    'it is dog'.split(),
    'dog it is'.split(),
    'a dog, it is'.split() 
]
candidate = 'it is a dog'.split()

print('Individual 1-gram: %f' % sentence_bleu(reference, candidate, weights=(1, 0, 0, 0)))
print('Individual 2-gram: %f' % sentence_bleu(reference, candidate, weights=(0, 1, 0, 0)))
print('Individual 3-gram: %f' % sentence_bleu(reference, candidate, weights=(0, 0, 1, 0)))
print('Individual 4-gram: %f' % sentence_bleu(reference, candidate, weights=(0, 0, 0, 1)))

Output:

Copy Code

Individual 1-gram: 1.000000 Individual 2-gram: 1.000000 Individual 3-gram: 0.500000 Individual 4-gram: 1.000000

By default, the sentence_bleu() function calculates the cumulative 4-gram BLEU score, also called BLEU-4. The weights for BLEU-4 are as follows:

Copy Code


(0.25, 0.25, 0.25, 0.25)

Let’s see the BLEU-4 code:

Copy Code



score = sentence_bleu(reference, candidate, weights=(0.25, 0.25, 0.25, 0.25))
print(score)

Output:

Copy Code

0.8408964152537145

That’s the exact score we got without the n-gram weights added.

Conclusion

This tutorial was about calculating the BLEU score in Python. We learned what it is and how to calculate individual and cumulative n-gram BLEU scores. Hope you had fun learning with us!

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

How to Calculate BLEU Score in Python?

Calculating the BLEU Score in Python

1. Input and Split the Sentences

2. Calculate the BLEU Score in Python

3. Complete Code for Implementing BLEU Score in Python

4. Calculating the n-gram Score

Conclusion

Do you have any questions, a specific use case, or special requirements?

Start now for free.